#Overview & Goals
This markdown file will be the place where we can conduct our exploratory data analysis.
Below you will find the questions that we are using to explore the data and who is responsible for conducting the data analysis associated with that question.
By the end of this process we will have as a team:
Imported the cleaned dataset from Lab 3
All of the EDA questions examined with clear code & written explanations of the results.
Data visualization for the various EDAs.
A coherent final presentation (in a separate powerpoint created by Lilly), along with a clear assignment of who is presenting what.
LINK TO SLIDES: https://docs.google.com/presentation/d/1Z7EedURd1mhE6WoUVpifnr4v6w-2GmTj2a6IKPiLOHg/edit?usp=sharing
*General notes: >Some example code for cleaning and data analysis will be pulled into the presentation, so make sure you are annotating code as you go so we can explain what each line does if needed.
keep committing your progress with clear messages so we can see contributions and troubleshoot errors if needed.
#Task Assignment
Lilly is going to be working on this question
Edwin is going to be working on this question
Bingtang is going to be working on this question
Original question looked at impacts of data cleaning on analysis. The data that needed to be cleaned was not a significant portion of the total data set, so there was not a significant change in staticsis of the data before vs after cleaning.
#Setup
##Packages
library(tidyverse)
library(arrow)
library(logger)
library(glue)
library(dplyr)
library(tidyr)
library(rlang)
library(lubridate)
library(tictoc)
library(here)
library(jsonlite)
library(scales)
library(knitr)
library(kableExtra)
library(DT)
library(tigris) # for zctas()
library(sf) # for spatial functions
library(stringr) # for string cleaning
# source(here("src", "data_cleaning.r"))
library(zipcodeR)
library(arrow)
library(ggridges) # ridge plots
library(tigris) # US geographic boundaries
library(sf) # spatial data handling
library(viridis) # colorblind-safe palettes(maps)
library(mgcv) # GAM/BAM modeling
library(pROC)
source(here("src", "data_cleaning.r")) # Functions loaded at runtime - linter warnings are false positives
#Lilly imported these from Lab 3, please feel free to add any additional libraries we might need.
*Note: please try to use tidyverse packages and functions, we want to make sure everyone is familiar with the functions we are using.
#Import the cleaned dataset
#READ THIS BEFORE RUNNING CODE
#Since we cannot "save" the cleaned data in lab 03 please make sure to run this code "saveRDS(bids_clean, here::here("data", "bids_clean.rds"))" at line 1318 and make sure to NOT commit that change, then you can run this code
# bids_clean <- readRDS(here::here("data", "bids_clean.rds"))
data_cleaning_pipeline <- function(df, expected_columns, zip_code_db = NULL, save_path = NULL, verbose = TRUE) {
df <- clean_price_column(df,
min_price = 0,
max_price = 10,
fix_leading_o = TRUE,
verbose = verbose)
df <- clean_geo_region_column(df,
verbose = verbose)
df <- clean_zip_column(df,
zip_code_db = zip_code_db,
verbose = verbose)
df <- clean_response_time_column(df,
col_name = "RESPONSE_TIME",
output_col_name = "RESPONSE_TIME_clean",
extract_digits = TRUE,
verbose = verbose)
df <- clean_timestamp_column(df,
col_name = "TIMESTAMP",
verbose = verbose)
df <- clean_city_column(df,
zip_code_db = zip_code_db,
verbose = verbose)
df <- clean_geo_coordinates_column(df,
verbose = verbose)
df <- clean_bids_won_column(df,
verbose = verbose)
df <- clean_date_column(df,
col_name = "DATE_UTC",
output_col_name = "DATE_UTC_clean",
verbose = verbose)
df <- clean_device_type_column(df,
col_name = "DEVICE_TYPE",
output_col_name = "DEVICE_TYPE_clean",
verbose = verbose)
df <- clean_response_time_column(df,
col_name = "RESPONSE_TIME",
output_col_name = "RESPONSE_TIME_clean",
extract_digits = TRUE,
verbose = verbose)
df <- clean_requested_sizes_column(df,
col_name = "REQUESTED_SIZES",
output_col_name = "REQUESTED_SIZES_clean",
verbose = verbose)
duplicate_handler <- remove_duplicates(df,
exclude_cols = c("row_id"),
verbose = verbose)
df <- duplicate_handler[["df"]]
removed_indices <- duplicate_handler[["removed_indices"]]
if (!is.null(save_path)) {
write_parquet(df, save_path)
}
return(df)
}
# ---- Timing Start ----
run_time <- system.time({
#------------------------------------------------------
# LOAD EXPECTED BIDS COLUMNS FROM CSV
#------------------------------------------------------
expected_columns <- data.frame(readr::read_csv(
here::here("src", "expected_columns.csv"),
col_types = "ccc"
))
#------------------------------------------------------
# LOAD BIDS DATA FROM PARQUET
#------------------------------------------------------
cat("\n")
cat(strrep("=", 70), "\n")
cat("STARTING BIDS DATA PROCESSING\n")
cat(strrep("=", 70), "\n\n")
# Load data
cat("Loading data...\n")
original_bids <- read_parquet(here("data", "bids_data_vDTR.parquet"))
bids <- original_bids %>% mutate(row_id = row_number())
cat(glue("Loaded {nrow(original_bids)} rows and {ncol(original_bids)} columns\n\n"))
#------------------------------------------------------
# LOAD ZIPCODE DATA
#------------------------------------------------------
# Load ZIP → city lookup from zipcodeR
zip_code_db <- load_oregon_zips()
#------------------------------------------------------
# CHECK FOR MISSING COLUMNS
#------------------------------------------------------
missing_columns <- check_columns(bids, expected_columns$column)
cat(glue::glue("There are {length(missing_columns)} missing column(s): \n {paste(missing_columns, collapse = ', ')}"))
#------------------------------------------------------
# TYPE SUMMARY
#------------------------------------------------------
bids_type_summary <- check_column_types(bids, expected_columns)
print(bids_type_summary)
save_path <- NULL
# save_path <- here("data", "bids_data_vDTR_clean.parquet")
bids <- data_cleaning_pipeline(bids, expected_columns, zip_code_db, save_path, verbose = TRUE)
cat("\n")
cat(strrep("=", 70), "\n")
cat("CREATING FINAL CLEANED DATASET\n")
cat(strrep("=", 70), "\n\n")
# Create final cleaned dataset
bids_clean <- bids %>%
select(
row_id,
DATE_UTC_clean,
TIMESTAMP_clean,
AUCTION_ID,
PUBLISHER_ID,
PRICE_final,
DEVICE_GEO_REGION_clean,
DEVICE_GEO_ZIP_clean,
DEVICE_GEO_CITY_clean,
DEVICE_GEO_LAT_clean,
DEVICE_GEO_LONG_clean,
BID_WON_clean,
RESPONSE_TIME_clean,
DEVICE_TYPE_clean,
SIZE,
REQUESTED_SIZES_clean
)
# %>%
# rename_with(~ str_remove(., "(_clean|_final)$"))
print(class(bids_clean))
glimpse(bids_clean)
# NA counts per column
na_count_by_col <- colSums(is.na(bids_clean %>% select(-REQUESTED_SIZES_clean)))
cat("\nNA Counts per Column:\n")
cat(strrep("=", 70), "\n")
print(na_count_by_col)
total_na_rows <- sum(!complete.cases(bids_clean %>% select(-REQUESTED_SIZES_clean)))
print(glue("Total NA rows: {total_na_rows}"))
}) # ---- Timing End ----
##
## ======================================================================
## STARTING BIDS DATA PROCESSING
## ======================================================================
##
## Loading data...
## Loaded 443969 rows and 15 columns
##
## Loading Oregon ZCTA data
## Loading Oregon ZCTA data from cached parquet...
## There are 1 missing column(s):
## DEVICE_GEO_COUNTRY column actual Expected_Type match Notes_Actual_Type
## 1 TIMESTAMP character POSIXct FALSE TIMESTAMP_NTZ
## 2 DATE_UTC character Date FALSE DATE
## 3 AUCTION_ID character character TRUE VARCHAR
## 4 PUBLISHER_ID character character TRUE VARCHAR
## 5 DEVICE_TYPE integer character FALSE VARCHAR
## 6 DEVICE_GEO_REGION character character TRUE VARCHAR(2)
## 7 DEVICE_GEO_CITY character character TRUE VARCHAR
## 8 DEVICE_GEO_ZIP character character TRUE VARCHAR(10)
## 9 DEVICE_GEO_LAT numeric numeric TRUE FLOAT
## 10 DEVICE_GEO_LONG numeric numeric TRUE FLOAT
## 11 REQUESTED_SIZES character list FALSE VARCHAR (or ARRAY)
## 12 SIZE character character TRUE VARCHAR
## 13 PRICE character numeric FALSE NUMBER(12,6)
## 14 RESPONSE_TIME character integer FALSE NUMBER(10,0)
## 15 BID_WON character logical FALSE BOOLEAN
##
## ============================================================
## Converting PRICE to numeric
## ============================================================
## Applying preprocessing...
## Found 1 non-numeric value(s), attempting to fix...
## Converting PRICE...
## PRICE_clean is now: numeric
##
## NA COUNT:
## There are 0 NAs in PRICE_clean
##
##
## ============================================================
## Cleaning DEVICE_GEO_REGION column
## ============================================================
## Current values in DEVICE_GEO_REGION:
## Or OR oregon xor <NA>
## 53689 333826 41513 14941 0
## # A tibble: 3 × 2
## region_lower n
## <chr> <int>
## 1 or 387515
## 2 oregon 41513
## 3 xor 14941
##
## Number of NA values in DEVICE_GEO_REGION_clean: 0
##
## ============================================================
## Cleaning DEVICE_GEO_ZIP column
## ============================================================
## Current values in DEVICE_GEO_ZIP:
## --------------------------------------------------
## ZIP CODE RECOVERY REPORT
## --------------------------------------------------
## Original missing (NA): 21198
## Original sentinels: 18
## Total bad ZIPs: 21216
## Spatial join matches: 440849 (points matched to ZCTA polygons)
## Recovered ZIPs: 21196
## Remaining NA: 20
## Recovery rate: 99.9%
## --------------------------------------------------
##
##
## ============================================================
## Converting RESPONSE_TIME to integer
## ============================================================
## Applying preprocessing...
## Extracting digits from string...
## Converting RESPONSE_TIME...
## RESPONSE_TIME_clean is now: integer
##
## NA COUNT:
## There are 0 NAs in RESPONSE_TIME_clean
##
##
## ============================================================
## Converting TIMESTAMP_clean to POSIXct
## ============================================================
## Converting TIMESTAMP_clean...
## TIMESTAMP_clean is now: POSIXct
##
## NA COUNT:
## There are 0 NAs in TIMESTAMP_clean
##
##
## ============================================================
## Cleaning DEVICE_GEO_CITY column
## ============================================================
##
## --------------------------------------------------
## CITY RECOVERY REPORT
## --------------------------------------------------
## Original missing: 21198
## Recovered via ZIP: 21196
## Remaining NA: 2
## --------------------------------------------------
## Unmatched ZIPs: 0 unique values
## Top unmatched ZIPs:
## # A tibble: 0 × 2
## # ℹ 2 variables: DEVICE_GEO_ZIP_clean <chr>, n <int>
##
## ============================================================
## Cleaning DEVICE_GEO_LAT and DEVICE_GEO_LONG columns
## ============================================================
## Latitudes are consistent with Oregon.
## Longitudes include locations outside Oregon.
## Number of implausible coordinates: 100
##
## ============================================================
## Cleaning BID_WON column
## ============================================================
## Current values in BID_WON:
##
## FALSE true TRUE <NA>
## 323285 10 120674 0
##
## Current values in BID_WON_clean:
## FALSE TRUE <NA>
## 323285 120684 0
##
## ============================================================
## Converting DATE_UTC to Date
## ============================================================
## Converting DATE_UTC...
## DATE_UTC_clean is now: Date
##
## NA COUNT:
## There are 0 NAs in DATE_UTC_clean
##
##
## ============================================================
## Converting DEVICE_TYPE to character
## ============================================================
## Converting DEVICE_TYPE...
## DEVICE_TYPE_clean is now: character
##
## NA COUNT:
## There are 0 NAs in DEVICE_TYPE_clean
##
##
## ============================================================
## Converting RESPONSE_TIME to integer
## ============================================================
## Applying preprocessing...
## Extracting digits from string...
## Converting RESPONSE_TIME...
## RESPONSE_TIME_clean is now: integer
##
## NA COUNT:
## There are 0 NAs in RESPONSE_TIME_clean
##
##
## ============================================================
## Converting REQUESTED_SIZES to list
## ============================================================
## Parsing JSON elements...
## REQUESTED_SIZES is now: list
##
##
## ============================================================
## Removing duplicate rows
## ============================================================
## Removed 2434 duplicate rows.
## Remaining rows: 441535
##
## ======================================================================
## CREATING FINAL CLEANED DATASET
## ======================================================================
##
## [1] "tbl_df" "tbl" "data.frame"
## Rows: 441,535
## Columns: 16
## $ row_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
## $ DATE_UTC_clean <date> 2025-10-21, 2025-10-21, 2025-10-21, 2025-10-2…
## $ TIMESTAMP_clean <dttm> 2025-10-21 23:42:37, 2025-10-21 23:42:37, 202…
## $ AUCTION_ID <chr> "0000060c-b8a9-414b-aeae-5f841472d6bb", "00000…
## $ PUBLISHER_ID <chr> "LteIcOiSsaE5", "LteIcOiSsaE5", "LteIcOiSsaE5"…
## $ PRICE_final <dbl> 0.04000000, 0.06728000, 0.23000000, 0.04475100…
## $ DEVICE_GEO_REGION_clean <chr> "OR", "OR", "OR", "OR", "OR", "OR", "OR", "OR"…
## $ DEVICE_GEO_ZIP_clean <chr> "97302", "97302", "97302", "97302", "97302", "…
## $ DEVICE_GEO_CITY_clean <chr> "Salem", "Salem", "Salem", "Salem", "Salem", "…
## $ DEVICE_GEO_LAT_clean <dbl> 44.9036, 44.9036, 44.9036, 44.9036, 44.9036, 4…
## $ DEVICE_GEO_LONG_clean <dbl> -123.0461, -123.0461, -123.0461, -123.0461, -1…
## $ BID_WON_clean <chr> "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "F…
## $ RESPONSE_TIME_clean <int> 259, 86, 80, 259, 461, 252, 361, 173, 178, 92,…
## $ DEVICE_TYPE_clean <chr> "4", "4", "4", "4", "4", "1", "1", "1", "1", "…
## $ SIZE <chr> "320x50", "320x50", "320x50", "320x50", "320x5…
## $ REQUESTED_SIZES_clean <list> <"320x50", "300x50">, <"320x50", "300x50">, <…
##
## NA Counts per Column:
## ======================================================================
## row_id DATE_UTC_clean TIMESTAMP_clean
## 0 0 0
## AUCTION_ID PUBLISHER_ID PRICE_final
## 0 0 452
## DEVICE_GEO_REGION_clean DEVICE_GEO_ZIP_clean DEVICE_GEO_CITY_clean
## 0 20 2
## DEVICE_GEO_LAT_clean DEVICE_GEO_LONG_clean BID_WON_clean
## 0 100 0
## RESPONSE_TIME_clean DEVICE_TYPE_clean SIZE
## 0 0 0
## Total NA rows: 570
cat(glue::glue("\n\nTotal runtime for data cleaning: {round(run_time[['elapsed']], 2)} seconds\n"))
##
## Total runtime for data cleaning: 30.08 seconds
glimpse(bids_clean)
## Rows: 441,535
## Columns: 16
## $ row_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
## $ DATE_UTC_clean <date> 2025-10-21, 2025-10-21, 2025-10-21, 2025-10-2…
## $ TIMESTAMP_clean <dttm> 2025-10-21 23:42:37, 2025-10-21 23:42:37, 202…
## $ AUCTION_ID <chr> "0000060c-b8a9-414b-aeae-5f841472d6bb", "00000…
## $ PUBLISHER_ID <chr> "LteIcOiSsaE5", "LteIcOiSsaE5", "LteIcOiSsaE5"…
## $ PRICE_final <dbl> 0.04000000, 0.06728000, 0.23000000, 0.04475100…
## $ DEVICE_GEO_REGION_clean <chr> "OR", "OR", "OR", "OR", "OR", "OR", "OR", "OR"…
## $ DEVICE_GEO_ZIP_clean <chr> "97302", "97302", "97302", "97302", "97302", "…
## $ DEVICE_GEO_CITY_clean <chr> "Salem", "Salem", "Salem", "Salem", "Salem", "…
## $ DEVICE_GEO_LAT_clean <dbl> 44.9036, 44.9036, 44.9036, 44.9036, 44.9036, 4…
## $ DEVICE_GEO_LONG_clean <dbl> -123.0461, -123.0461, -123.0461, -123.0461, -1…
## $ BID_WON_clean <chr> "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "F…
## $ RESPONSE_TIME_clean <int> 259, 86, 80, 259, 461, 252, 361, 173, 178, 92,…
## $ DEVICE_TYPE_clean <chr> "4", "4", "4", "4", "4", "1", "1", "1", "1", "…
## $ SIZE <chr> "320x50", "320x50", "320x50", "320x50", "320x5…
## $ REQUESTED_SIZES_clean <list> <"320x50", "300x50">, <"320x50", "300x50">, <…
#Run Exploratory Data Analysis
*How does bidding volume change across hours of the day and days of the week?
# Extract hour and day of week
bids_clean <- bids_clean %>%
mutate(
hour = hour(TIMESTAMP_clean),
day_of_week = wday(TIMESTAMP_clean, label = TRUE, abbr = FALSE, week_start = 1),
day_of_week = factor(day_of_week,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
ordered = FALSE)
)
# Summarize bid volume by hour
hourly_volume <- bids_clean %>%
count(hour)
all_days <- factor(
c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
ordered = FALSE
)
# Summarize bid volume by day of week (include 0s)
daily_volume <- bids_clean %>%
count(day_of_week) %>%
complete(day_of_week = all_days, fill = list(n = 0))
# Plot: Bidding volume by hour
ggplot(hourly_volume, aes(x = hour, y = n, fill = as.factor(hour))) +
geom_col(show.legend = FALSE) +
scale_x_continuous(
breaks = 0:23,
labels = format(strptime(0:23, format = "%H"), format = "%I %p")
) +
scale_fill_viridis_d(option = "cividis") +
labs(
title = "Bidding Volume by Hour of Day",
x = "",
y = "# of Bids"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Plot: Bidding volume by day of week
ggplot(daily_volume, aes(x = day_of_week, y = n, fill = day_of_week)) +
geom_col(show.legend = FALSE) +
scale_y_continuous(labels = label_comma()) +
scale_fill_viridis_d(option = "cividis") +
labs(
title = "Bidding Volume by Day of Week",
x = "",
y = "# of Bids"
) +
theme_minimal()
Comments: At first I thought there was something wrong since we are only seeing results from a Tuesday and Wednesday, but after investigation, the original data set given to use only has data from 10/21/2025 and 10/22/2025, so this makes sense. This will limit the conclusions we can draw from the data, but it is not error.
Note: For the plots I intentionally added a minimal theme and made sure that the color schemes added were color-blind friendly and accessible. Specifically “cividis” is most optimized for all types of vision, including grayscale.
*When are the peak bidding periods, and what might explain those spikes?
# Identify top 3 bidding hours
top_hours <- hourly_volume %>%
top_n(3, n) %>%
arrange(desc(n))
# Identify peak day(s)
top_days <- daily_volume %>%
top_n(1, n)
print(top_hours)
## # A tibble: 3 × 2
## hour n
## <int> <int>
## 1 2 67111
## 2 3 61514
## 3 1 60233
print(top_days)
## # A tibble: 1 × 2
## day_of_week n
## <fct> <int>
## 1 Wednesday 273779
The top three peak bidding hours are 2am, 3am, and 1am with 67,111 bids, 61,514 bids, and 60,233 bids respectively. These early hour spikes suggest that the bidding system might be automated or operating across multiple timezones (if bidders are from outside of the PST), but that is not the case here as all the bids come from OR. Another explanation could be backlogged jobs. Since it is very unlikely that these bids come from humans at these early/late hours it points to scheduled systems or ad bots.
Wednesday had the highest volume with 273,779 bids, but again we only have data from two days of the week so that does not mean much here.
# Ridgeline plot of price distributions for won vs lost bids
ggplot(bids_clean, aes(y = BID_WON_clean, x = PRICE_final, fill = BID_WON_clean)) +
geom_density_ridges(alpha = 0.7, scale = 1.2) + # ridge density curves
scale_fill_manual(
values = c("FALSE" = "#D55E00", "TRUE" = "#009E73"), # custom colors
labels = c("Lost Bid", "Won Bid") # legend labels
) +
labs(
title = "Bid Price Distribution by Bid Outcome", # plot title
subtitle = "Ridgeline plot highlights distribution shapes and overlap",
x = "Bid Price", # x-axis label
y = "Bid Outcome" # y-axis label
) +
scale_y_discrete(labels = c("FALSE" = "Lost Bids", "TRUE" = "Won Bids")) +
scale_fill_viridis_d(option = "cividis") +
theme_minimal(base_size = 13) + # clean theme
theme(plot.title = element_text(face = "bold")) # bold title
The ridgeline plot gives a clear picture of how bid prices differ between winning and losing bids. Most losing bids are packed tightly at the lower end of the price range, while winning bids tend to extend into slightly higher prices. There is some overlap at very low bid amounts, but overall the curve for winning bids stretches farther to the right. This pattern suggests that higher bid prices generally improve the chances of winning.
# Violin + Boxplot comparing bid prices for wins vs losses
ggplot(bids_clean, aes(x = BID_WON_clean, y = PRICE_final, fill = BID_WON_clean)) +
geom_violin(trim = FALSE, alpha = 0.6) + # violin shows price distribution shape
geom_boxplot(width = 0.15, outlier.shape = NA, alpha = 0.9) + # boxplot shows median + IQR
scale_fill_manual(
values = c("FALSE" = "#D55E00", "TRUE" = "#009E73"), # custom colors
labels = c("Lost Bid", "Won Bid"), # legend labels
name = "Bid Outcome"
) +
labs(
title = "Distribution of Bid Prices by Bid Outcome", # plot title
subtitle = "Violin + Boxplot highlight price differences",
x = "Bid Outcome",
y = "Bid Price"
) +
scale_x_discrete(labels = c("FALSE" = "Lost Bids", "TRUE" = "Won Bids")) +
scale_fill_viridis_d(option = "cividis") +
theme_minimal(base_size = 13) + # clean theme
theme(
plot.title = element_text(face = "bold", size = 15), # bold title
strip.text = element_text(face = "bold"),
legend.position = "none" # hide legend (labels on x-axis)
)
The violin–boxplot shows clear differences in how bid prices behave for winning and losing bids. Losing bids are mostly concentrated at very low prices, with only a few stretching upward. In contrast, winning bids tend to be higher overall, with a wider spread and a noticeably higher median. While there is still some overlap at the lower end, the shape of the distributions shows that higher bid prices are much more common among winning bids. Overall, the plot suggests that bidding slightly more greatly improves the odds of winning.
# Boxplot of price distribution across advertisers
ggplot(bids_clean, aes(x = factor(PUBLISHER_ID), y = PRICE_final,
fill = factor(PUBLISHER_ID))) +
geom_boxplot(show.legend = FALSE) + # boxplots for each advertiser
coord_flip() + # horizontal orientation for readability
scale_fill_viridis_d(option = "cividis") +
labs(
title = "Bid Distributions by Publishers",
x = "Publisher",
y = "Bid Amount"
)
The boxplot highlights how bidding behavior varies widely across advertisers. Some advertisers place consistently low and tightly clustered bids, while others submit much higher bids with a broader spread. Several advertisers show long tails and many outliers, indicating occasional aggressive bidding. A few advertisers also stand out with significantly higher median bid values compared to the rest. Overall, the plot shows that bidding strategies differ substantially across advertisers, with some adopting conservative pricing and others frequently pushing toward the upper end of the bid range.
# Hexbin map showing spatial bid patterns by advertiser
ggplot(bids_clean, aes(DEVICE_GEO_LONG_clean, DEVICE_GEO_LAT_clean)) +
stat_summary_hex(aes(z = PRICE_final), fun = mean) + # hexbin with mean price per hex
scale_fill_viridis_c(trans = "log", # color scale on log scale for contrast
name = "Avg Bid Price (log scale)") +
coord_equal() + # preserve geographic proportions
facet_wrap(~ PUBLISHER_ID) + # separate map panel per advertiser
scale_fill_viridis_c(
option = "cividis", # ← use cividis for continuous scale
trans = "log",
name = "Avg Bid Price\n(log scale)"
) +
labs(
title = "Spatial Patterns of Publisher Bid Prices Across Oregon",
x = "Longitude",
y = "Latitude"
)
The faceted map shows how public vary not only in the prices they bid but also in where those bids appear across Oregon. Some publishers have activity spread broadly throughout the state, while others are concentrated in just a few regions. Within each publisher’s panel, the color shading highlights differences in bid intensity: lighter areas represent higher average prices, while darker areas reflect lower prices. The patterns suggest that publishers target different geographic areas and may adjust their bidding strategies depending on where users are located. Thus the map reveals distinct spatial footprints for each publisher, with both bidding levels and geographic focus varying considerably from one publisher to another.
# Clean binary column from the original character ones
bids_clean$BID_WON_clean <- ifelse(bids_clean$BID_WON_clean == "TRUE", 1,
ifelse(bids_clean$BID_WON_clean == "FALSE", 0, NA))
# drop NAs
bids_clean <- bids_clean %>%
filter(
!is.na(BID_WON_clean),
!is.na(PRICE_final),
!is.na(DEVICE_GEO_LONG_clean),
!is.na(DEVICE_GEO_LAT_clean)
)
set.seed(123) # ensure the train/test split is reproducible
# Determine number of rows in the dataset
n <- nrow(bids_clean)
# Randomly select 60% of rows for training
train_index <- sample(seq_len(n), size = 0.6 * n)
# Split the dataset into training and testing sets
train_data <- bids_clean[train_index, ] # 60% training data
test_data <- bids_clean[-train_index, ] # remaining 40% test data
# Fit a BAM model (fast GAM) to predict win probability
model <- bam(
BID_WON_clean ~
s(PRICE_final, k = 10) + # smooth effect of bid price
te(DEVICE_GEO_LONG_clean, DEVICE_GEO_LAT_clean, # 2D smooth for spatial effects
k = c(8, 8)),
data = train_data, # training dataset
family = binomial(), # binary outcome (win or lose)
discrete = TRUE # speed optimization for big data
)
summary(model) # show model fit statistics and significance
##
## Family: binomial
## Link function: logit
##
## Formula:
## BID_WON_clean ~ s(PRICE_final, k = 10) + te(DEVICE_GEO_LONG_clean,
## DEVICE_GEO_LAT_clean, k = c(8, 8))
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.81966 0.09377 19.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(PRICE_final) 8.833 8.983 65469 <2e-16 ***
## te(DEVICE_GEO_LONG_clean,DEVICE_GEO_LAT_clean) 49.184 53.880 1445 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.345 Deviance explained = 30.2%
## fREML = 3.5153e+05 Scale est. = 1 n = 264589
The model shows a strong and clear relationship between how much an advertiser bids and their chances of winning the auction. The smooth term for bid amount (s(PRICE_final)) is highly significant (p < 2e-16), meaning changes in bid price have a meaningful effect on the probability of winning. In general, higher bids increase the likelihood of winning, which aligns with how auction systems typically function.
The spatial smooth term (s(longitude, latitude)) is also significant, indicating that location matters as well—some geographic areas are naturally more competitive than others.
The model explains about 30% of the deviance and has an adjusted R-squired of 0.345, which is reasonable for behavioral auction data. Overall, the results suggest that while bid amount is a major driver of winning probability, location-based competition also plays an important role.
# Predict on test data
test_data$pred_prob <- predict(model, newdata = test_data, type = "response")
# Compute AUC
roc_obj <- roc(test_data$BID_WON_clean, test_data$pred_prob)
auc_value <- auc(roc_obj)
print(auc_value)
## Area under the curve: 0.86
The AUC of 0.86 shows that our model is very effective at separating winning and losing bids, about 86% of the time, it assigns a higher win probability to an actual winner than to a loser. This indicates strong predictive power, meaning the model is reliable for estimating competitive bid prices, identifying locations where bids tend to perform better or worse, and predicting the likelihood of winning for new bids.
# showing price in relation to probability of win or loss a bid
ggplot(test_data, aes(x = PRICE_final, y = pred_prob, color = pred_prob)) +
geom_point(alpha = 0.3) +
geom_smooth(color = "black", se = FALSE, size = 1.2) +
scale_color_viridis_c(option = "cividis") +
labs(
title = "Predicted Win Probability vs Bid Price",
x = "Bid Price",
y = "Predicted Win Probability",
color = "Win Prob"
) +
theme_minimal(base_size = 13)
The figure shows how win probability changes with bid price. Win chances rise quickly at low prices, meaning small increases make a big difference. As prices move higher, the improvement slows, and eventually the probability levels off, suggesting that once a bid is competitive, raising the price further adds little benefit.
# Predict win probability using your trained model
bids_clean$pred_prob <- predict(
model,
newdata = bids_clean,
type = "response"
)
# Prepare data for hex plotting
bids_pred_hex <- bids_clean %>%
mutate(
DEVICE_GEO_LONG_clean = as.numeric(DEVICE_GEO_LONG_clean),
DEVICE_GEO_LAT_clean = as.numeric(DEVICE_GEO_LAT_clean)
) %>%
filter(
is.finite(DEVICE_GEO_LONG_clean),
is.finite(DEVICE_GEO_LAT_clean),
is.finite(pred_prob)
)
# Oregon state outline
or_state <- states(cb = TRUE, year = 2023, class = "sf") %>%
filter(STUSPS == "OR") %>%
st_transform(4326)
# Counties shapefile for Oregon
or_counties <- tigris::counties(state = "OR", cb = TRUE, year = 2023, class = "sf") %>%
sf::st_transform(4326)
# Plot Model-Predicted Win Probability
ggplot() +
# Oregon boundary
geom_sf(data = or_state, fill = "gray95", color = "gray50", linewidth = 0.4) +
# Oregon counties
geom_sf(data = or_counties, fill = NA, color = "gray60", linewidth = 0.3) +
# Predicted win-probability hex map
stat_summary_hex(
data = bids_pred_hex,
aes(
x = DEVICE_GEO_LONG_clean,
y = DEVICE_GEO_LAT_clean,
z = pred_prob
),
fun = mean,
bins = 50,
alpha = 0.85
) +
# Color scale for predictions
scale_fill_viridis_c(
option = "plasma",
name = "Predicted\nWin Probability",
limits = c(0, 1)
) +
coord_sf(expand = FALSE) +
labs(
title = "Predicted Win Probability Across Oregon Cities",
subtitle = "Spatial representation using Big Additive Model probability estimates",
x = "Longitude",
y = "Latitude"
) +
theme_minimal() +
theme(
panel.grid = element_blank(),
plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(size = 11),
legend.position = "right"
)
# convert back to char
bids_clean$BID_WON_clean <- ifelse(bids_clean$BID_WON_clean == 1, "TRUE", "FALSE")
The map shows how predicted win probabilities vary across Oregon. Higher probabilities cluster around major populated areas, especially in the northwest region, while many rural areas show lower or more scattered win chances. This suggests that auction competitiveness and bidding dynamics differ by location, with some cities consistently offering more favorable conditions for winning bids than others.
#Library
library(dplyr)
library(ggplot2)
library(tidyr)
library(corrplot)
library(lubridate)
library(forcats)
library(gridExtra)
library(ggrepel)
#1. Which features are most predictive of a win?
# Define Cramér's V function
cramers_v <- function(x, y) {
# Create contingency table
confusion_matrix <- table(x, y)
# Calculate chi-squared test
chi2 <- chisq.test(confusion_matrix)
# Get dimensions and sample size
n <- sum(confusion_matrix)
k <- min(dim(confusion_matrix))
# Calculate Cramér's V
v <- sqrt(chi2$statistic / (n * (k - 1)))
return(as.numeric(v))
}
# Now run the categorical analysis again
cat("\n")
cat(strrep("=", 70), "\n")
## ======================================================================
cat("CATEGORICAL VARIABLE ANALYSIS WITH BID_WON\n")
## CATEGORICAL VARIABLE ANALYSIS WITH BID_WON
cat(strrep("=", 70), "\n\n")
## ======================================================================
# Define categorical variables for analysis
categorical_vars <- c("DEVICE_TYPE_clean", "DEVICE_GEO_REGION_clean",
"DEVICE_GEO_CITY_clean", "SIZE", "DAY_OF_WEEK")
# Prepare the data for correlation analysis
bids_analysis <- bids_clean %>%
# Convert BID_WON to numeric (0/1)
mutate(
BID_WON_numeric = as.numeric(as.logical(BID_WON_clean)),
# Extract date/time features
HOUR = hour(TIMESTAMP_clean),
DAY_OF_WEEK = wday(DATE_UTC_clean, label = TRUE, week_start = 1),
DAY_OF_WEEK_NUM = as.numeric(DAY_OF_WEEK),
MONTH = month(DATE_UTC_clean),
DAY = day(DATE_UTC_clean),
# Convert DEVICE_TYPE to factor
DEVICE_TYPE_fct = as.factor(DEVICE_TYPE_clean),
# Create ad size area from SIZE column
SIZE_WIDTH = as.numeric(str_extract(SIZE, "^[0-9]+")),
SIZE_HEIGHT = as.numeric(str_extract(SIZE, "[0-9]+$")),
SIZE_AREA = SIZE_WIDTH * SIZE_HEIGHT,
# Extract number of requested sizes
NUM_REQUESTED_SIZES = map_int(REQUESTED_SIZES_clean, length),
# Create region indicator
REGION_OR = as.numeric(DEVICE_GEO_REGION_clean == "OR"),
# Log-transform price for better distribution
PRICE_log = log(PRICE_final + 0.001),
# Log-transform response time
RESPONSE_TIME_log = log(RESPONSE_TIME_clean + 1)
)
categorical_results <- list()
for (var in categorical_vars) {
if (var %in% names(bids_analysis)) {
# Remove NA values for this analysis
temp_data <- bids_analysis %>%
select(!!sym(var), BID_WON_numeric) %>%
filter(!is.na(!!sym(var))) %>%
na.omit()
if(nrow(temp_data) > 0 && length(unique(temp_data[[var]])) > 1) {
# Calculate win rates
win_rates <- temp_data %>%
group_by(!!sym(var)) %>%
summarise(
count = n(),
win_rate = mean(BID_WON_numeric, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(win_rate))
# Calculate Cramér's V with error handling
tryCatch({
cramer_v <- cramers_v(temp_data[[var]], temp_data$BID_WON_numeric)
cat("Variable:", var, "\n")
cat("Cramér's V:", round(cramer_v, 4), "\n")
cat("Sample size:", nrow(temp_data), "\n")
cat("Unique categories:", length(unique(temp_data[[var]])), "\n")
if(nrow(win_rates) <= 10) {
cat("All categories:\n")
print(win_rates)
} else {
cat("Top 5 categories by win rate:\n")
print(head(win_rates, 5))
cat("\nBottom 5 categories by win rate:\n")
print(tail(win_rates, 5))
}
# Store results
categorical_results[[var]] <- list(
cramer_v = cramer_v,
win_rates = win_rates,
n = nrow(temp_data),
n_categories = length(unique(temp_data[[var]]))
)
}, error = function(e) {
cat("Variable:", var, "\n")
cat("Could not calculate Cramér's V:", e$message, "\n")
})
cat("\n")
cat(strrep("-", 50), "\n\n")
} else {
cat("Variable:", var, "\n")
cat("Insufficient data or only one category\n\n")
cat(strrep("-", 50), "\n\n")
}
}
}
## Variable: DEVICE_TYPE_clean
## Cramér's V: 0.1328
## Sample size: 440983
## Unique categories: 5
## All categories:
## # A tibble: 5 × 3
## DEVICE_TYPE_clean count win_rate
## <chr> <int> <dbl>
## 1 2 24583 0.474
## 2 0 3694 0.370
## 3 1 228990 0.289
## 4 5 318 0.286
## 5 4 183398 0.223
##
## --------------------------------------------------
##
## Variable: DEVICE_GEO_REGION_clean
## Insufficient data or only one category
##
## --------------------------------------------------
## Variable: DEVICE_GEO_CITY_clean
## Cramér's V: 0.0927
## Sample size: 440983
## Unique categories: 202
## Top 5 categories by win rate:
## # A tibble: 5 × 3
## DEVICE_GEO_CITY_clean count win_rate
## <chr> <int> <dbl>
## 1 Canyon City 1 1
## 2 Columbia City 1 1
## 3 Drewsey 1 1
## 4 Government Camp 1 1
## 5 Weston 1 1
##
## Bottom 5 categories by win rate:
## # A tibble: 5 × 3
## DEVICE_GEO_CITY_clean count win_rate
## <chr> <int> <dbl>
## 1 North Powder 84 0.119
## 2 Oakridge 60 0.117
## 3 Prospect 9 0.111
## 4 O'Brien 27 0.111
## 5 Elmira 100 0.1
##
## --------------------------------------------------
## Variable: SIZE
## Cramér's V: 0.1471
## Sample size: 440983
## Unique categories: 38
## Top 5 categories by win rate:
## # A tibble: 5 × 3
## SIZE count win_rate
## <chr> <int> <dbl>
## 1 1080x1080 2 1
## 2 1080x566 1 1
## 3 1140x635 3 1
## 4 320x106 2 1
## 5 750x570 1 1
##
## Bottom 5 categories by win rate:
## # A tibble: 5 × 3
## SIZE count win_rate
## <chr> <int> <dbl>
## 1 300x251 319 0.0658
## 2 1140x600 1 0
## 3 1280x1280 1 0
## 4 970x500 9 0
## 5 970x66 1 0
##
## --------------------------------------------------
## Variable: DAY_OF_WEEK
## Cramér's V: NaN
## Sample size: 440983
## Unique categories: 2
## All categories:
## # A tibble: 2 × 3
## DAY_OF_WEEK count win_rate
## <ord> <int> <dbl>
## 1 Wed 273426 0.273
## 2 Tue 167557 0.272
##
## --------------------------------------------------
# Let's also calculate some simpler metrics for categorical variables
cat("\n")
cat(strrep("=", 70), "\n")
## ======================================================================
cat("ALTERNATIVE CATEGORICAL ANALYSIS: RELATIVE WIN RATES\n")
## ALTERNATIVE CATEGORICAL ANALYSIS: RELATIVE WIN RATES
cat(strrep("=", 70), "\n\n")
## ======================================================================
# For each categorical variable, calculate the range of win rates
for (var in categorical_vars) {
if (var %in% names(bids_analysis)) {
temp_data <- bids_analysis %>%
select(!!sym(var), BID_WON_numeric) %>%
filter(!is.na(!!sym(var))) %>%
na.omit()
if(nrow(temp_data) > 0 && length(unique(temp_data[[var]])) > 1) {
win_summary <- temp_data %>%
group_by(!!sym(var)) %>%
summarise(
count = n(),
win_rate = mean(BID_WON_numeric, na.rm = TRUE),
.groups = "drop"
) %>%
filter(count > 10) # Only consider categories with enough data
if(nrow(win_summary) > 1) {
win_range <- max(win_summary$win_rate) - min(win_summary$win_rate)
best_category <- win_summary[which.max(win_summary$win_rate), ]
worst_category <- win_summary[which.min(win_summary$win_rate), ]
cat("Variable:", var, "\n")
cat("Win rate range:", round(win_range, 4), "\n")
cat("Best category:", best_category[[1]],
"(win rate:", round(best_category$win_rate, 4),
", n:", best_category$count, ")\n")
cat("Worst category:", worst_category[[1]],
"(win rate:", round(worst_category$win_rate, 4),
", n:", worst_category$count, ")\n")
cat("Relative difference:", round(best_category$win_rate / worst_category$win_rate, 2), "x\n")
cat("\n")
}
}
}
}
## Variable: DEVICE_TYPE_clean
## Win rate range: 0.2506
## Best category: 2 (win rate: 0.4739 , n: 24583 )
## Worst category: 4 (win rate: 0.2233 , n: 183398 )
## Relative difference: 2.12 x
##
## Variable: DEVICE_GEO_CITY_clean
## Win rate range: 0.6143
## Best category: Jordan Valley (win rate: 0.7143 , n: 14 )
## Worst category: Elmira (win rate: 0.1 , n: 100 )
## Relative difference: 7.14 x
##
## Variable: SIZE
## Win rate range: 0.7913
## Best category: 320x107 (win rate: 0.8571 , n: 14 )
## Worst category: 300x251 (win rate: 0.0658 , n: 319 )
## Relative difference: 13.02 x
##
## Variable: DAY_OF_WEEK
## Win rate range: 0.0015
## Best category: 3 (win rate: 0.2735 , n: 273426 )
## Worst category: 2 (win rate: 0.2719 , n: 167557 )
## Relative difference: 1.01 x
# Let's create a comprehensive summary table of all features
cat("\n")
cat(strrep("=", 70), "\n")
## ======================================================================
cat("COMPREHENSIVE FEATURE IMPORTANCE SUMMARY\n")
## COMPREHENSIVE FEATURE IMPORTANCE SUMMARY
cat(strrep("=", 70), "\n\n")
## ======================================================================
# Collect all metrics in one data frame
feature_importance_summary <- data.frame()
# 1. Add numerical features from correlation
# First, create the numerical features dataset
numerical_features_clean <- bids_analysis %>%
select(
BID_WON_numeric,
PRICE_final,
PRICE_log,
RESPONSE_TIME_clean,
RESPONSE_TIME_log,
SIZE_WIDTH,
SIZE_HEIGHT,
SIZE_AREA,
NUM_REQUESTED_SIZES,
HOUR,
DAY_OF_WEEK_NUM,
DAY,
MONTH,
DEVICE_GEO_LAT_clean,
DEVICE_GEO_LONG_clean,
REGION_OR
) %>%
# Remove rows with any missing values for correlation
na.omit() %>%
# Remove infinite values if they exist
mutate(across(everything(), ~ ifelse(is.infinite(.), NA, .))) %>%
na.omit()
# Calculate correlation matrix
cor_matrix <- cor(numerical_features_clean, use = "complete.obs", method = "pearson")
# Get correlation with BID_WON
bid_won_correlations <- cor_matrix["BID_WON_numeric", ]
bid_won_cor_sorted <- sort(abs(bid_won_correlations), decreasing = TRUE)
# Create a nice formatted dataframe
bid_won_cor_df <- data.frame(
Feature = names(bid_won_cor_sorted),
Correlation = round(bid_won_correlations[names(bid_won_cor_sorted)], 4),
Absolute_Correlation = round(bid_won_cor_sorted, 4)
) %>%
filter(Feature != "BID_WON_numeric") # Remove self-correlation
for (feature in bid_won_cor_df$Feature) {
if(feature != "BID_WON_numeric") {
cor_value <- bid_won_cor_df$Correlation[bid_won_cor_df$Feature == feature]
feature_importance_summary <- rbind(feature_importance_summary,
data.frame(
Feature = feature,
Type = "Numerical",
Metric = "Pearson Correlation",
Value = abs(cor_value),
Direction = ifelse(cor_value > 0, "Positive", "Negative"),
Importance_Level = case_when(
abs(cor_value) >= 0.3 ~ "High",
abs(cor_value) >= 0.1 ~ "Medium",
TRUE ~ "Low"
)
))
}
}
# 2. Add categorical features from Cramér's V
for (var in names(categorical_results)) {
feature_importance_summary <- rbind(feature_importance_summary,
data.frame(
Feature = var,
Type = "Categorical",
Metric = "Cramér's V",
Value = categorical_results[[var]]$cramer_v,
Direction = "Variable",
Importance_Level = case_when(
categorical_results[[var]]$cramer_v >= 0.3 ~ "High",
categorical_results[[var]]$cramer_v >= 0.1 ~ "Medium",
TRUE ~ "Low"
)
))
}
# Sort by importance
feature_importance_summary <- feature_importance_summary %>%
arrange(desc(Value))
cat("ALL FEATURES RANKED BY PREDICTIVE POWER:\n")
## ALL FEATURES RANKED BY PREDICTIVE POWER:
cat(strrep("-", 70), "\n")
## ----------------------------------------------------------------------
print(feature_importance_summary)
## Feature Type Metric Value Direction
## 1 PRICE_log Numerical Pearson Correlation 0.51470000 Positive
## 2 PRICE_final Numerical Pearson Correlation 0.48690000 Positive
## 3 SIZE Categorical Cramér's V 0.14714903 Variable
## 4 DEVICE_TYPE_clean Categorical Cramér's V 0.13277789 Variable
## 5 SIZE_AREA Numerical Pearson Correlation 0.12050000 Positive
## 6 SIZE_HEIGHT Numerical Pearson Correlation 0.12010000 Positive
## 7 DEVICE_GEO_CITY_clean Categorical Cramér's V 0.09273084 Variable
## 8 NUM_REQUESTED_SIZES Numerical Pearson Correlation 0.04880000 Positive
## 9 RESPONSE_TIME_clean Numerical Pearson Correlation 0.03780000 Positive
## 10 SIZE_WIDTH Numerical Pearson Correlation 0.03330000 Positive
## 11 DEVICE_GEO_LAT_clean Numerical Pearson Correlation 0.02790000 Positive
## 12 RESPONSE_TIME_log Numerical Pearson Correlation 0.02660000 Positive
## 13 DEVICE_GEO_LONG_clean Numerical Pearson Correlation 0.01230000 Positive
## 14 HOUR Numerical Pearson Correlation 0.00190000 Negative
## 15 DAY_OF_WEEK_NUM Numerical Pearson Correlation 0.00170000 Positive
## 16 DAY Numerical Pearson Correlation 0.00170000 Positive
## 17 DAY_OF_WEEK Categorical Cramér's V NaN Variable
## Importance_Level
## 1 High
## 2 High
## 3 Medium
## 4 Medium
## 5 Medium
## 6 Medium
## 7 Low
## 8 Low
## 9 Low
## 10 Low
## 11 Low
## 12 Low
## 13 Low
## 14 Low
## 15 Low
## 16 Low
## 17 Low
cat(strrep("-", 70), "\n\n")
## ----------------------------------------------------------------------
# Create a visualization of feature importance
ggplot(feature_importance_summary, aes(x = reorder(Feature, Value), y = Value, fill = Importance_Level)) +
geom_bar(stat = "identity", alpha = 0.8) +
geom_text(aes(label = round(Value, 3)),
hjust = -0.1, size = 3.5) +
coord_flip() +
scale_fill_manual(values = c("High" = "#E41A1C", "Medium" = "#377EB8", "Low" = "#4DAF4A")) +
labs(
title = "Feature Importance for Predicting Bid Wins",
subtitle = "Higher values indicate stronger predictive power",
x = "Feature",
y = "Predictive Strength (Correlation or Cramér's V)",
fill = "Importance Level"
) +
theme_minimal() +
theme(legend.position = "bottom") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)))
# Get device-specific win rates
device_rates <- categorical_results$DEVICE_TYPE_clean$win_rates
best_device <- device_rates[which.max(device_rates$win_rate), ]
worst_device <- device_rates[which.min(device_rates$win_rate), ]
Conclusion Q1: PRICE IS KING: - Bid price has the strongest correlation with winning ( 0.487 ) - Increasing bid price is the most reliable way to increase win probability
#2. Systematic differences across devices, regions, or ad categories
# MULTIVARIATE ANALYSIS WITH ROBUST FACTOR HANDLING
cat("\n")
cat(strrep("=", 80), "\n")
## ================================================================================
cat("MULTIVARIATE ANALYSIS: SYSTEMATIC DIFFERENCES IN OUTCOMES\n")
## MULTIVARIATE ANALYSIS: SYSTEMATIC DIFFERENCES IN OUTCOMES
cat(strrep("=", 80), "\n\n")
## ================================================================================
# 1. CAREFULLY PREPARE DATA WITH FACTOR CHECKING
analysis_data <- bids_analysis %>%
# Select relevant variables
select(
BID_WON_numeric,
PRICE_final,
RESPONSE_TIME_clean,
HOUR,
DEVICE_TYPE_clean,
DEVICE_GEO_REGION_clean,
SIZE,
SIZE_AREA,
NUM_REQUESTED_SIZES
) %>%
# Remove any rows with missing values
drop_na() %>%
# Convert to proper types
mutate(
# Ensure character type
DEVICE_TYPE_clean = as.character(DEVICE_TYPE_clean),
DEVICE_GEO_REGION_clean = as.character(DEVICE_GEO_REGION_clean),
SIZE = as.character(SIZE)
)
# Check unique values for each potential factor
cat("Unique value counts before filtering:\n")
## Unique value counts before filtering:
cat("- Device types:", length(unique(analysis_data$DEVICE_TYPE_clean)), "\n")
## - Device types: 5
cat("- Regions:", length(unique(analysis_data$DEVICE_GEO_REGION_clean)), "\n")
## - Regions: 1
cat("- Ad sizes:", length(unique(analysis_data$SIZE)), "\n\n")
## - Ad sizes: 38
# Filter to ensure we have multiple levels for each factor
analysis_data_filtered <- analysis_data %>%
# Group by combination of factors and count
group_by(DEVICE_TYPE_clean, DEVICE_GEO_REGION_clean, SIZE) %>%
mutate(group_count = n()) %>%
ungroup() %>%
# Keep only factor levels that appear enough times
mutate(
# Check if device type has enough variety
device_ok = length(unique(DEVICE_TYPE_clean[group_count > 10])) > 1,
# Check if region has enough variety
region_ok = length(unique(DEVICE_GEO_REGION_clean[group_count > 10])) > 1,
# Check if size has enough variety
size_ok = length(unique(SIZE[group_count > 10])) > 1
) %>%
# Keep only if all factors have multiple levels
filter(device_ok & region_ok & size_ok) %>%
# Convert to factors
mutate(
DEVICE_TYPE_clean = factor(DEVICE_TYPE_clean),
DEVICE_GEO_REGION_clean = factor(DEVICE_GEO_REGION_clean),
SIZE = factor(SIZE)
) %>%
select(-device_ok, -region_ok, -size_ok, -group_count)
cat("After filtering - Unique value counts:\n")
## After filtering - Unique value counts:
cat("- Device types:", length(levels(analysis_data_filtered$DEVICE_TYPE_clean)), "\n")
## - Device types: 0
cat("- Regions:", length(levels(analysis_data_filtered$DEVICE_GEO_REGION_clean)), "\n")
## - Regions: 0
cat("- Ad sizes:", length(levels(analysis_data_filtered$SIZE)), "\n")
## - Ad sizes: 0
cat("Final sample size:", nrow(analysis_data_filtered), "\n\n")
## Final sample size: 0
# If we still don't have multiple levels, do univariate analyses instead
if (length(levels(analysis_data_filtered$DEVICE_TYPE_clean)) < 2 ||
length(levels(analysis_data_filtered$DEVICE_GEO_REGION_clean)) < 2 ||
length(levels(analysis_data_filtered$SIZE)) < 2) {
cat("WARNING: Insufficient variation for full multivariate model.\n")
cat("Performing separate univariate analyses instead.\n\n")
# 2. SEPARATE UNIVARIATE ANALYSES
cat("SEPARATE ANALYSES FOR EACH FACTOR\n")
cat(strrep("-", 80), "\n\n")
# Function for univariate analysis
run_univariate_analysis <- function(data, factor_var, factor_name) {
if (length(unique(data[[factor_var]])) > 1) {
cat(paste("ANALYSIS FOR", factor_name, ":\n"))
cat(strrep("-", 40), "\n")
# Descriptive statistics
desc_stats <- data %>%
group_by(!!sym(factor_var)) %>%
summarise(
n = n(),
win_rate = mean(BID_WON_numeric),
avg_price = mean(PRICE_final),
avg_response = mean(RESPONSE_TIME_clean),
.groups = "drop"
) %>%
arrange(desc(win_rate))
cat("Descriptive Statistics:\n")
print(desc_stats)
cat("\n")
# Statistical test (ANOVA for win rate differences)
if (nrow(desc_stats) > 1) {
aov_result <- aov(BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
cat("ANOVA test for win rate differences:\n")
print(summary(aov_result))
# Tukey HSD post-hoc test if significant
if (summary(aov_result)[[1]]$"Pr(>F)"[1] < 0.05) {
cat("\nTukey HSD Post-hoc Comparisons:\n")
tukey_result <- TukeyHSD(aov_result)
print(tukey_result)
}
cat("\n")
}
# Visualize
p <- ggplot(desc_stats, aes(x = reorder(!!sym(factor_var), win_rate), y = win_rate)) +
geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) +
geom_errorbar(aes(ymin = win_rate - 1.96*sqrt(win_rate*(1-win_rate)/n),
ymax = win_rate + 1.96*sqrt(win_rate*(1-win_rate)/n)),
width = 0.2) +
geom_text(aes(label = paste0(round(win_rate*100, 1), "%\n(n=", n, ")")),
vjust = -0.3, size = 3) +
labs(
title = paste("Win Rate by", factor_name),
x = factor_name,
y = "Win Rate"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(p)
cat(strrep("=", 80), "\n\n")
} else {
cat(paste("Insufficient variation in", factor_name, "for analysis\n\n"))
}
}
# Run analyses for each factor
run_univariate_analysis(analysis_data, "DEVICE_TYPE_clean", "Device Type")
run_univariate_analysis(analysis_data, "DEVICE_GEO_REGION_clean", "Region")
run_univariate_analysis(analysis_data, "SIZE", "Ad Size")
} else {
# 3. PROCEED WITH MULTIVARIATE ANALYSIS
cat("Proceeding with multivariate analysis...\n\n")
# Model 1: Basic multivariate model
basic_model <- glm(
BID_WON_numeric ~
scale(PRICE_final) +
scale(RESPONSE_TIME_clean) +
HOUR +
scale(SIZE_AREA) +
NUM_REQUESTED_SIZES +
DEVICE_TYPE_clean +
DEVICE_GEO_REGION_clean +
SIZE,
data = analysis_data_filtered,
family = binomial()
)
cat("MODEL 1: MULTIVARIATE LOGISTIC REGRESSION\n")
cat(strrep("-", 80), "\n")
cat("Model coefficients (systematic differences after controlling for other factors):\n\n")
# Extract and format results
model_results <- broom::tidy(basic_model) %>%
mutate(
odds_ratio = exp(estimate),
significance = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
variable_type = case_when(
grepl("DEVICE_TYPE", term) ~ "Device Type",
grepl("DEVICE_GEO_REGION", term) ~ "Region",
grepl("SIZE", term) ~ "Ad Size",
grepl("PRICE", term) ~ "Price",
grepl("RESPONSE", term) ~ "Response Time",
grepl("HOUR", term) ~ "Hour",
grepl("SIZE_AREA", term) ~ "Ad Size Area",
grepl("NUM_REQUESTED", term) ~ "# Requested Sizes",
TRUE ~ "Intercept"
)
) %>%
arrange(variable_type, term)
# Print systematic differences
systematic_diffs <- model_results %>%
filter(variable_type %in% c("Device Type", "Region", "Ad Size"))
if (nrow(systematic_diffs) > 0) {
cat("SYSTEMATIC DIFFERENCES:\n")
for (var_type in unique(systematic_diffs$variable_type)) {
cat(paste("\n", var_type, "Effects (vs reference category):\n"))
var_results <- systematic_diffs %>%
filter(variable_type == var_type) %>%
select(term, estimate, odds_ratio, p.value, significance)
print(var_results, n = 20)
}
}
cat("\nCONTROL VARIABLE EFFECTS:\n")
control_vars <- model_results %>%
filter(variable_type %in% c("Price", "Response Time", "Hour", "Ad Size Area", "# Requested Sizes"))
print(control_vars, n = 10)
# 4. MODEL COMPARISON TO QUANTIFY CATEGORY EFFECTS
cat("\n\nMODEL 2: MODEL COMPARISON TO QUANTIFY CATEGORY EFFECTS\n")
cat(strrep("-", 80), "\n")
# Fit models without each category to see contribution
models <- list()
# Base model without categories
models$base <- glm(
BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR,
data = analysis_data_filtered,
family = binomial()
)
# Add each category separately
models$with_device <- glm(
BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR + DEVICE_TYPE_clean,
data = analysis_data_filtered,
family = binomial()
)
models$with_region <- glm(
BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR + DEVICE_GEO_REGION_clean,
data = analysis_data_filtered,
family = binomial()
)
models$with_size <- glm(
BID_WON_numeric ~ scale(PRICE_final) + scale(RESPONSE_TIME_clean) + HOUR + SIZE,
data = analysis_data_filtered,
family = binomial()
)
# Compare models using AIC
model_comparison <- map_dfr(models, ~ data.frame(AIC = AIC(.x)), .id = "model") %>%
mutate(
delta_AIC = AIC - min(AIC),
improvement_over_base = AIC[model == "base"] - AIC
) %>%
arrange(AIC)
cat("Model Comparison (lower AIC is better):\n")
print(model_comparison)
# 5. PREDICTED WIN PROBABILITIES
cat("\n\nMODEL 3: PREDICTED WIN PROBABILITIES BY CATEGORY\n")
cat(strrep("-", 80), "\n")
# Create representative scenarios
median_price <- median(analysis_data_filtered$PRICE_final)
median_response <- median(analysis_data_filtered$RESPONSE_TIME_clean)
median_hour <- 12
# Get top categories for each factor
top_device <- analysis_data_filtered %>%
count(DEVICE_TYPE_clean) %>%
arrange(desc(n)) %>%
pull(DEVICE_TYPE_clean) %>%
head(3)
top_region <- analysis_data_filtered %>%
count(DEVICE_GEO_REGION_clean) %>%
arrange(desc(n)) %>%
pull(DEVICE_GEO_REGION_clean) %>%
head(3)
top_size <- analysis_data_filtered %>%
count(SIZE) %>%
arrange(desc(n)) %>%
pull(SIZE) %>%
head(3)
# Create prediction scenarios
pred_scenarios <- expand.grid(
PRICE_final = median_price,
RESPONSE_TIME_clean = median_response,
HOUR = median_hour,
SIZE_AREA = median(analysis_data_filtered$SIZE_AREA),
NUM_REQUESTED_SIZES = median(analysis_data_filtered$NUM_REQUESTED_SIZES),
DEVICE_TYPE_clean = top_device,
DEVICE_GEO_REGION_clean = top_region,
SIZE = top_size,
stringsAsFactors = TRUE
)
# Predict
predictions <- pred_scenarios %>%
mutate(
predicted_win = predict(basic_model, newdata = ., type = "response"),
scenario_id = paste(DEVICE_TYPE_clean, DEVICE_GEO_REGION_clean, SIZE, sep = " | ")
) %>%
arrange(desc(predicted_win))
cat("Top 10 Best Scenarios for Winning:\n")
print(predictions %>%
select(scenario_id, predicted_win) %>%
head(10), n = 10)
cat("\nBottom 10 Worst Scenarios for Winning:\n")
print(predictions %>%
select(scenario_id, predicted_win) %>%
tail(10), n = 10)
# 6. VISUALIZE CATEGORY EFFECTS
cat("\n\nVISUALIZATION OF SYSTEMATIC DIFFERENCES\n")
cat(strrep("-", 80), "\n")
# Calculate adjusted win rates (controlling for price and response time)
adjusted_data <- analysis_data_filtered %>%
group_by(DEVICE_TYPE_clean, DEVICE_GEO_REGION_clean, SIZE) %>%
summarise(
n = n(),
raw_win_rate = mean(BID_WON_numeric),
# Adjusted using model residuals
residual = mean(residuals(basic_model)),
adjusted_win_rate = raw_win_rate - residual,
.groups = "drop"
) %>%
filter(n > 10)
# Plot device differences
if (nrow(adjusted_data) > 0) {
p1 <- adjusted_data %>%
group_by(DEVICE_TYPE_clean) %>%
summarise(
adj_rate = weighted.mean(adjusted_win_rate, n),
n_total = sum(n)
) %>%
filter(n_total > 50) %>%
ggplot(aes(x = reorder(DEVICE_TYPE_clean, adj_rate), y = adj_rate)) +
geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) +
geom_text(aes(label = round(adj_rate, 3)),
vjust = -0.5, size = 3) +
labs(
title = "Adjusted Win Rate by Device Type",
subtitle = "Controlling for price, response time, and other factors",
x = "Device Type",
y = "Adjusted Win Rate"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(p1)
}
}
## WARNING: Insufficient variation for full multivariate model.
## Performing separate univariate analyses instead.
##
## SEPARATE ANALYSES FOR EACH FACTOR
## --------------------------------------------------------------------------------
##
## ANALYSIS FOR Device Type :
## ----------------------------------------
## Descriptive Statistics:
## # A tibble: 5 × 5
## DEVICE_TYPE_clean n win_rate avg_price avg_response
## <chr> <int> <dbl> <dbl> <dbl>
## 1 2 24583 0.474 0.696 246.
## 2 0 3694 0.370 0.589 222.
## 3 1 228990 0.289 0.493 210.
## 4 5 318 0.286 1.01 256.
## 5 4 183398 0.223 0.340 184.
##
## ANOVA test for win rate differences:
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(data[[factor_var]]) 4 1543 385.7 1978 <2e-16 ***
## Residuals 440978 85958 0.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Tukey HSD Post-hoc Comparisons:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
##
## $`factor(data[[factor_var]])`
## diff lwr upr p adj
## 1-0 -0.080300223 -0.100274407 -0.06032604 0.0000000
## 2-0 0.104075206 0.082823525 0.12532689 0.0000000
## 4-0 -0.146536685 -0.166550246 -0.12652312 0.0000000
## 5-0 -0.083625325 -0.154007220 -0.01324343 0.0104672
## 2-1 0.184375428 0.176292508 0.19245835 0.0000000
## 4-1 -0.066236462 -0.070010358 -0.06246257 0.0000000
## 5-1 -0.003325102 -0.070906984 0.06425678 0.9999271
## 4-2 -0.250611890 -0.258791632 -0.24243215 0.0000000
## 5-2 -0.187700530 -0.255670941 -0.11973012 0.0000000
## 5-4 0.062911360 -0.004682171 0.13050489 0.0822208
## ================================================================================
##
## Insufficient variation in Region for analysis
##
## ANALYSIS FOR Ad Size :
## ----------------------------------------
## Descriptive Statistics:
## # A tibble: 38 × 5
## SIZE n win_rate avg_price avg_response
## <chr> <int> <dbl> <dbl> <dbl>
## 1 1080x1080 2 1 2.64 215
## 2 1080x566 1 1 1.16 208
## 3 1140x635 3 1 1.31 565
## 4 320x106 2 1 1.43 246.
## 5 750x570 1 1 6.26 309
## 6 320x107 14 0.857 0.789 122
## 7 640x480 91 0.791 0.564 187.
## 8 1140x250 3 0.667 1.08 271
## 9 640x360 6 0.667 0.564 146.
## 10 480x360 5 0.6 0.631 193.
## # ℹ 28 more rows
##
## ANOVA test for win rate differences:
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(data[[factor_var]]) 37 1895 51.21 263.8 <2e-16 ***
## Residuals 440945 85606 0.19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Tukey HSD Post-hoc Comparisons:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
##
## $`factor(data[[factor_var]])`
## diff lwr upr p adj
## 1080x1080-0x0 6.588889e-01 -0.5460674653 1.863845243 0.9854964
## 1080x566-0x0 6.588889e-01 -1.0442318633 2.362009641 0.9999868
## 1140x250-0x0 3.255556e-01 -0.6588324041 1.309943515 0.9999998
## 1140x600-0x0 -3.411111e-01 -2.0442318633 1.362009641 1.0000000
## 1140x635-0x0 6.588889e-01 -0.3254990708 1.643276849 0.8231857
## 1200x250-0x0 -7.777778e-03 -0.9921657374 0.976610182 1.0000000
## 1237x500-0x0 -9.419753e-02 -0.2394714980 0.051076436 0.8702139
## 1280x1280-0x0 -3.411111e-01 -2.0442318633 1.362009641 1.0000000
## 160x600-0x0 2.401163e-01 0.1231579328 0.357074719 0.0000000
## 180x150-0x0 -9.111111e-02 -0.9440880011 0.761865779 1.0000000
## 1x1-0x0 2.427350e-02 -0.2184991953 0.267046204 1.0000000
## 250x250-0x0 1.588889e-01 -0.6940880011 1.011865779 1.0000000
## 300x100-0x0 -1.744444e-01 -0.6690851368 0.320196248 0.9999987
## 300x1050-0x0 -2.146743e-01 -0.3343425689 -0.095006090 0.0000000
## 300x250-0x0 1.014517e-02 -0.0467880776 0.067078418 1.0000000
## 300x251-0x0 -2.752804e-01 -0.3861952044 -0.164365576 0.0000000
## 300x50-0x0 -6.186382e-02 -0.1197250739 -0.004002563 0.0186098
## 300x600-0x0 5.404241e-02 -0.0061095725 0.114194386 0.1708765
## 320x100-0x0 -2.577778e-01 -0.3897712636 -0.125784292 0.0000000
## 320x106-0x0 6.588889e-01 -0.5460674653 1.863845243 0.9854964
## 320x107-0x0 5.160317e-01 0.0575815895 0.974481903 0.0076697
## 320x480-0x0 -7.444444e-02 -0.5175915967 0.368702708 1.0000000
## 320x50-0x0 -1.163429e-01 -0.1731776339 -0.059508150 0.0000000
## 325x508-0x0 -1.808547e-01 -0.2926843747 -0.069025027 0.0000003
## 336x280-0x0 1.935401e-02 -0.1008523619 0.139560372 1.0000000
## 468x60-0x0 -1.154474e-01 -0.2420953025 0.011200514 0.1481709
## 480x360-0x0 2.588889e-01 -0.5044586904 1.022236468 0.9999996
## 600x300-0x0 -7.777778e-03 -0.9921657374 0.976610182 1.0000000
## 620x366-0x0 -1.411111e-01 -0.2871525108 0.004930289 0.0776015
## 640x360-0x0 3.255556e-01 -0.3716671454 1.022778257 0.9991233
## 640x480-0x0 4.500977e-01 0.2628574456 0.637337915 0.0000000
## 728x90-0x0 5.946401e-02 -0.0008346331 0.119762662 0.0600874
## 750x570-0x0 6.588889e-01 -1.0442318633 2.362009641 0.9999868
## 970x250-0x0 1.729451e-01 0.1067412676 0.239148960 0.0000000
## 970x500-0x0 -3.411111e-01 -0.9113328001 0.229110578 0.9485090
## 970x66-0x0 -3.411111e-01 -2.0442318633 1.362009641 1.0000000
## 970x90-0x0 -6.169935e-02 -0.2182999634 0.094901271 0.9999795
## 1080x566-1080x1080 9.319212e-13 -2.0847305445 2.084730545 1.0000000
## 1140x250-1080x1080 -3.333333e-01 -1.8871997374 1.220533071 1.0000000
## 1140x600-1080x1080 -1.000000e+00 -3.0847305445 1.084730545 0.9985134
## 1140x635-1080x1080 2.704503e-13 -1.5538664041 1.553866404 1.0000000
## 1200x250-1080x1080 -6.666667e-01 -2.2205330708 0.887199737 0.9998526
## 1237x500-1080x1080 -7.530864e-01 -1.9641131214 0.457940282 0.9176190
## 1280x1280-1080x1080 -1.000000e+00 -3.0847305445 1.084730545 0.9985134
## 160x600-1080x1080 -4.187726e-01 -1.6267296867 0.789184560 0.9999992
## 180x150-1080x1080 -7.500000e-01 -2.2241271050 0.724127105 0.9955999
## 1x1-1080x1080 -6.346154e-01 -1.8611632762 0.591932507 0.9941098
## 250x250-1080x1080 -5.000000e-01 -1.9741271050 0.974127105 0.9999996
## 300x100-1080x1080 -8.333333e-01 -2.1333912402 0.466724574 0.8848482
## 300x1050-1080x1080 -8.735632e-01 -2.0817857286 0.334659292 0.6698159
## 300x250-1080x1080 -6.487437e-01 -1.8523726238 0.554885187 0.9884321
## 300x251-1080x1080 -9.341693e-01 -2.1415562272 0.273217669 0.5031424
## 300x50-1080x1080 -7.207527e-01 -1.9244258654 0.482920451 0.9478638
## 300x600-1080x1080 -6.048465e-01 -1.8086319312 0.598938967 0.9964767
## 320x100-1080x1080 -9.166667e-01 -2.1261721139 0.292838781 0.5555522
## 320x106-1080x1080 4.665157e-13 -1.7021753617 1.702175362 1.0000000
## 320x107-1080x1080 -1.428571e-01 -1.4295807700 1.143866484 1.0000000
## 320x480-1080x1080 -7.333333e-01 -2.0146843958 0.548017729 0.9715508
## 320x50-1080x1080 -7.752318e-01 -1.9788560304 0.428392469 0.8788936
## 325x508-1080x1080 -8.397436e-01 -2.0472149238 0.367727744 0.7535795
## 336x280-1080x1080 -6.395349e-01 -1.8478108114 0.568741044 0.9913750
## 468x60-1080x1080 -7.743363e-01 -1.9832700445 0.434597478 0.8857681
## 480x360-1080x1080 -4.000000e-01 -1.8241420833 1.024142083 1.0000000
## 600x300-1080x1080 -6.666667e-01 -2.2205330708 0.887199737 0.9998526
## 620x366-1080x1080 -8.000000e-01 -2.0111190020 0.411119002 0.8439580
## 640x360-1080x1080 -3.333333e-01 -1.7231536963 1.056487030 1.0000000
## 640x480-1080x1080 -2.087912e-01 -1.4255656546 1.007983237 1.0000000
## 728x90-1080x1080 -5.994249e-01 -1.8032176610 0.604367913 0.9970063
## 750x570-1080x1080 5.966339e-13 -2.0847305445 2.084730545 1.0000000
## 970x250-1080x1080 -4.859438e-01 -1.6900468006 0.718159250 0.9999636
## 970x500-1080x1080 -1.000000e+00 -2.3306516904 0.330651690 0.5767240
## 970x66-1080x1080 -1.000000e+00 -3.0847305445 1.084730545 0.9985134
## 970x90-1080x1080 -7.205882e-01 -1.9330258213 0.491849351 0.9526289
## 1140x250-1080x566 -3.333333e-01 -2.2988361400 1.632169473 1.0000000
## 1140x600-1080x566 -1.000000e+00 -3.4072394821 1.407239482 0.9999287
## 1140x635-1080x566 -6.614709e-13 -1.9655028066 1.965502807 1.0000000
## 1200x250-1080x566 -6.666667e-01 -2.6321694733 1.298836140 0.9999996
## 1237x500-1080x566 -7.530864e-01 -2.4605073266 0.954334487 0.9997307
## 1280x1280-1080x566 -1.000000e+00 -3.4072394821 1.407239482 0.9999287
## 160x600-1080x566 -4.187726e-01 -2.1240176756 1.286472549 1.0000000
## 180x150-1080x566 -7.500000e-01 -2.6530899092 1.153089909 0.9999793
## 1x1-1080x566 -6.346154e-01 -2.3530798769 1.083849108 0.9999959
## 250x250-1080x566 -5.000000e-01 -2.4030899092 1.403089909 1.0000000
## 300x100-1080x566 -8.333333e-01 -2.6050136212 0.938346955 0.9989860
## 300x1050-1080x566 -8.735632e-01 -2.5789963350 0.831869898 0.9950479
## 300x250-1080x566 -6.487437e-01 -2.3509255604 1.053438123 0.9999909
## 300x251-1080x566 -9.341693e-01 -2.6390105396 0.770671982 0.9850243
## 300x50-1080x566 -7.207527e-01 -2.4229658410 0.981460426 0.9998901
## 300x600-1080x566 -6.048465e-01 -2.3071390212 1.097446057 0.9999985
## 320x100-1080x566 -9.166667e-01 -2.6230089266 0.789675593 0.9890262
## 320x106-1080x566 -4.654055e-13 -2.0847305445 2.084730545 1.0000000
## 320x107-1080x566 -1.428571e-01 -1.9047760325 1.619061747 1.0000000
## 320x480-1080x566 -7.333333e-01 -2.4913324876 1.024665821 0.9999216
## 320x50-1080x566 -7.752318e-01 -2.4774103305 0.926946769 0.9994715
## 325x508-1080x566 -8.397436e-01 -2.5446446143 0.865157435 0.9975495
## 336x280-1080x566 -6.395349e-01 -2.3450058446 1.065936077 0.9999939
## 468x60-1080x566 -7.743363e-01 -2.4802733630 0.931600797 0.9995070
## 480x360-1080x566 -4.000000e-01 -2.2646396849 1.464639685 1.0000000
## 600x300-1080x566 -6.666667e-01 -2.6321694733 1.298836140 0.9999996
## 620x366-1080x566 -8.000000e-01 -2.5074863742 0.907486374 0.9990618
## 640x360-1080x566 -3.333333e-01 -2.1718928571 1.505226190 1.0000000
## 640x480-1080x566 -2.087912e-01 -1.9202936286 1.502711211 1.0000000
## 728x90-1080x566 -5.994249e-01 -2.3017226022 1.102872854 0.9999988
## 750x570-1080x566 -3.352874e-13 -2.4072394821 2.407239482 1.0000000
## 970x250-1080x566 -4.859438e-01 -2.1884609048 1.216573355 1.0000000
## 970x500-1080x566 -1.000000e+00 -2.7942503734 0.794250373 0.9806289
## 970x66-1080x566 -1.000000e+00 -3.4072394821 1.407239482 0.9999287
## 970x90-1080x566 -7.205882e-01 -2.4290101331 0.987833663 0.9998993
## 1140x600-1140x250 -6.666667e-01 -2.6321694733 1.298836140 0.9999996
## 1140x635-1140x250 3.333333e-01 -1.0564870297 1.723153696 1.0000000
## 1200x250-1140x250 -3.333333e-01 -1.7231536963 1.056487030 1.0000000
## 1237x500-1140x250 -4.197531e-01 -1.4115622978 0.572056125 0.9998913
## 1280x1280-1140x250 -6.666667e-01 -2.6321694733 1.298836140 0.9999996
## 160x600-1140x250 -8.543923e-02 -1.0734980601 0.902619600 1.0000000
## 180x150-1140x250 -4.166667e-01 -1.7167245735 0.883391240 0.9999999
## 1x1-1140x250 -3.012821e-01 -1.3119845644 0.709420462 1.0000000
## 250x250-1140x250 -1.666667e-01 -1.4667245735 1.133391240 1.0000000
## 300x100-1140x250 -5.000000e-01 -1.5987494714 0.598749471 0.9994803
## 300x1050-1140x250 -5.402299e-01 -1.5286131477 0.448153378 0.9855931
## 300x250-1140x250 -3.154104e-01 -1.2981730123 0.667352242 0.9999999
## 300x251-1140x250 -6.008359e-01 -1.5881976232 0.386525732 0.9362829
## 300x50-1140x250 -3.874194e-01 -1.3702361988 0.595397451 0.9999792
## 300x600-1140x250 -2.715131e-01 -1.2544674951 0.711441197 1.0000000
## 320x100-1140x250 -5.833333e-01 -1.5732844769 0.406617810 0.9576473
## 320x106-1140x250 3.333333e-01 -1.2205330708 1.887199737 1.0000000
## 320x107-1140x250 1.904762e-01 -0.8924631117 1.273415493 1.0000000
## 320x480-1140x250 -4.000000e-01 -1.4765502240 0.676550224 0.9999952
## 320x50-1140x250 -4.418984e-01 -1.4246553726 0.540858478 0.9995948
## 325x508-1140x250 -5.064103e-01 -1.4938751227 0.481054610 0.9949435
## 336x280-1140x250 -3.062016e-01 -1.2946501110 0.682247010 1.0000000
## 468x60-1140x250 -4.410029e-01 -1.4302555359 0.548249636 0.9996621
## 480x360-1140x250 -6.666667e-02 -1.3097597899 1.176426457 1.0000000
## 600x300-1140x250 -3.333333e-01 -1.7231536963 1.056487030 1.0000000
## 620x366-1140x250 -4.666667e-01 -1.4585885773 0.525255244 0.9989816
## 640x360-1140x250 -1.001088e-12 -1.2036197411 1.203619741 1.0000000
## 640x480-1140x250 1.245421e-01 -0.8742771242 1.123361373 1.0000000
## 728x90-1140x250 -2.660915e-01 -1.2490548735 0.716871792 1.0000000
## 750x570-1140x250 3.333333e-01 -1.6321694733 2.298836140 1.0000000
## 970x250-1140x250 -1.526104e-01 -1.1359536858 0.830732802 1.0000000
## 970x500-1140x250 -6.666667e-01 -1.8014502412 0.468116908 0.9593017
## 970x66-1140x250 -6.666667e-01 -2.6321694733 1.298836140 0.9999996
## 970x90-1140x250 -3.872549e-01 -1.3807863507 0.606276547 0.9999842
## 1140x635-1140x600 1.000000e+00 -0.9655028066 2.965502807 0.9955999
## 1200x250-1140x600 3.333333e-01 -1.6321694733 2.298836140 1.0000000
## 1237x500-1140x600 2.469136e-01 -1.4605073266 1.954334487 1.0000000
## 1280x1280-1140x600 -1.793121e-12 -2.4072394821 2.407239482 1.0000000
## 160x600-1140x600 5.812274e-01 -1.1240176756 2.286472549 0.9999995
## 180x150-1140x600 2.500000e-01 -1.6530899092 2.153089909 1.0000000
## 1x1-1140x600 3.653846e-01 -1.3530798769 2.083849108 1.0000000
## 250x250-1140x600 5.000000e-01 -1.4030899092 2.403089909 1.0000000
## 300x100-1140x600 1.666667e-01 -1.6050136212 1.938346954 1.0000000
## 300x1050-1140x600 1.264368e-01 -1.5789963350 1.831869898 1.0000000
## 300x250-1140x600 3.512563e-01 -1.3509255604 2.053438123 1.0000000
## 300x251-1140x600 6.583072e-02 -1.6390105396 1.770671982 1.0000000
## 300x50-1140x600 2.792473e-01 -1.4229658410 1.981460426 1.0000000
## 300x600-1140x600 3.951535e-01 -1.3071390212 2.097446057 1.0000000
## 320x100-1140x600 8.333333e-02 -1.6230089266 1.789675593 1.0000000
## 320x106-1140x600 1.000000e+00 -1.0847305445 3.084730545 0.9985134
## 320x107-1140x600 8.571429e-01 -0.9047760325 2.619061747 0.9980559
## 320x480-1140x600 2.666667e-01 -1.4913324876 2.024665821 1.0000000
## 320x50-1140x600 2.247682e-01 -1.4774103305 1.926946769 1.0000000
## 325x508-1140x600 1.602564e-01 -1.5446446143 1.865157435 1.0000000
## 336x280-1140x600 3.604651e-01 -1.3450058446 2.065936077 1.0000000
## 468x60-1140x600 2.256637e-01 -1.4802733630 1.931600797 1.0000000
## 480x360-1140x600 6.000000e-01 -1.2646396849 2.464639685 0.9999999
## 600x300-1140x600 3.333333e-01 -1.6321694733 2.298836140 1.0000000
## 620x366-1140x600 2.000000e-01 -1.5074863742 1.907486374 1.0000000
## 640x360-1140x600 6.666667e-01 -1.1718928571 2.505226190 0.9999974
## 640x480-1140x600 7.912088e-01 -0.9202936286 2.502711211 0.9992829
## 728x90-1140x600 4.005751e-01 -1.3017226022 2.102872854 1.0000000
## 750x570-1140x600 1.000000e+00 -1.4072394821 3.407239482 0.9999287
## 970x250-1140x600 5.140562e-01 -1.1884609048 2.216573355 1.0000000
## 970x500-1140x600 -1.912692e-12 -1.7942503734 1.794250373 1.0000000
## 970x66-1140x600 -1.686873e-12 -2.4072394821 2.407239482 1.0000000
## 970x90-1140x600 2.794118e-01 -1.4290101331 1.987833663 1.0000000
## 1200x250-1140x635 -6.666667e-01 -2.0564870297 0.723153696 0.9985134
## 1237x500-1140x635 -7.530864e-01 -1.7448956312 0.238722792 0.5508337
## 1280x1280-1140x635 -1.000000e+00 -2.9655028066 0.965502807 0.9955999
## 160x600-1140x635 -4.187726e-01 -1.4068313934 0.569286267 0.9998877
## 180x150-1140x635 -7.500000e-01 -2.0500579069 0.550057907 0.9681971
## 1x1-1140x635 -6.346154e-01 -1.6453178977 0.376087128 0.9080547
## 250x250-1140x635 -5.000000e-01 -1.8000579069 0.800057907 0.9999886
## 300x100-1140x635 -8.333333e-01 -1.9320828047 0.265416138 0.5537166
## 300x1050-1140x635 -8.735632e-01 -1.8619464810 0.114820044 0.1990170
## 300x250-1140x635 -6.487437e-01 -1.6315063456 0.334018908 0.8449224
## 300x251-1140x635 -9.341693e-01 -1.9215309566 0.053192399 0.0994025
## 300x50-1140x635 -7.207527e-01 -1.7035695321 0.262064118 0.6365221
## 300x600-1140x635 -6.048465e-01 -1.5878008285 0.378107864 0.9271664
## 320x100-1140x635 -9.166667e-01 -1.9066178102 0.073284477 0.1259753
## 320x106-1140x635 1.960654e-13 -1.5538664041 1.553866404 1.0000000
## 320x107-1140x635 -1.428571e-01 -1.2257964451 0.940082159 1.0000000
## 320x480-1140x635 -7.333333e-01 -1.8098835574 0.343216891 0.7929938
## 320x50-1140x635 -7.752318e-01 -1.7579887059 0.207525145 0.4539237
## 325x508-1140x635 -8.397436e-01 -1.8272084561 0.147721277 0.2751788
## 336x280-1140x635 -6.395349e-01 -1.6279834443 0.348913677 0.8730680
## 468x60-1140x635 -7.743363e-01 -1.7635888692 0.214916303 0.4735792
## 480x360-1140x635 -4.000000e-01 -1.6430931233 0.843093123 0.9999999
## 600x300-1140x635 -6.666667e-01 -2.0564870297 0.723153696 0.9985134
## 620x366-1140x635 -8.000000e-01 -1.7919219106 0.191921911 0.3983862
## 640x360-1140x635 -3.333333e-01 -1.5369530744 0.870286408 1.0000000
## 640x480-1140x635 -2.087912e-01 -1.2076104576 0.790028040 1.0000000
## 728x90-1140x635 -5.994249e-01 -1.5823882068 0.383538458 0.9346245
## 750x570-1140x635 3.261835e-13 -1.9655028066 1.965502807 1.0000000
## 970x250-1140x635 -4.859438e-01 -1.4692870191 0.497399469 0.9973955
## 970x500-1140x635 -1.000000e+00 -2.1347835745 0.134783574 0.2043646
## 970x66-1140x635 -1.000000e+00 -2.9655028066 0.965502807 0.9955999
## 970x90-1140x635 -7.205882e-01 -1.7141196840 0.272943213 0.6625897
## 1237x500-1200x250 -8.641975e-02 -1.0782289645 0.905389458 1.0000000
## 1280x1280-1200x250 -3.333333e-01 -2.2988361400 1.632169473 1.0000000
## 160x600-1200x250 2.478941e-01 -0.7401647267 1.235952934 1.0000000
## 180x150-1200x250 -8.333333e-02 -1.3833912402 1.216724574 1.0000000
## 1x1-1200x250 3.205128e-02 -0.9786512310 1.042753795 1.0000000
## 250x250-1200x250 1.666667e-01 -1.1333912402 1.466724574 1.0000000
## 300x100-1200x250 -1.666667e-01 -1.2654161381 0.932082805 1.0000000
## 300x1050-1200x250 -2.068966e-01 -1.1952798143 0.781486711 1.0000000
## 300x250-1200x250 1.792295e-02 -0.9648396789 1.000685575 1.0000000
## 300x251-1200x250 -2.675026e-01 -1.2548642899 0.719859065 1.0000000
## 300x50-1200x250 -5.408604e-02 -1.0369028654 0.928730784 1.0000000
## 300x600-1200x250 6.182018e-02 -0.9211341618 1.044774531 1.0000000
## 320x100-1200x250 -2.500000e-01 -1.2399511435 0.739951144 1.0000000
## 320x106-1200x250 6.666667e-01 -0.8871997374 2.220533071 0.9998526
## 320x107-1200x250 5.238095e-01 -0.5591297784 1.606748826 0.9982565
## 320x480-1200x250 -6.666667e-02 -1.1432168907 1.009883557 1.0000000
## 320x50-1200x250 -1.085651e-01 -1.0913220393 0.874191811 1.0000000
## 325x508-1200x250 -1.730769e-01 -1.1605417894 0.814387943 1.0000000
## 336x280-1200x250 2.713178e-02 -0.9613167777 1.015580344 1.0000000
## 468x60-1200x250 -1.076696e-01 -1.0969222025 0.881582970 1.0000000
## 480x360-1200x250 2.666667e-01 -0.9764264566 1.509759790 1.0000000
## 600x300-1200x250 1.774136e-13 -1.3898203630 1.389820363 1.0000000
## 620x366-1200x250 -1.333333e-01 -1.1252552439 0.858588577 1.0000000
## 640x360-1200x250 3.333333e-01 -0.8702864077 1.536953074 1.0000000
## 640x480-1200x250 4.578755e-01 -0.5409437909 1.456694707 0.9993957
## 728x90-1200x250 6.724179e-02 -0.9157215401 1.050205125 1.0000000
## 750x570-1200x250 6.666667e-01 -1.2988361400 2.632169473 0.9999996
## 970x250-1200x250 1.807229e-01 -0.8026203524 1.164066136 1.0000000
## 970x500-1200x250 -3.333333e-01 -1.4681169078 0.801450241 1.0000000
## 970x66-1200x250 -3.333333e-01 -2.2988361400 1.632169473 1.0000000
## 970x90-1200x250 -5.392157e-02 -1.0474530173 0.939609880 1.0000000
## 1280x1280-1237x500 -2.469136e-01 -1.9543344871 1.460507327 1.0000000
## 160x600-1237x500 3.343139e-01 0.1659537592 0.502673954 0.0000000
## 180x150-1237x500 3.086420e-03 -0.8584444454 0.864617285 1.0000000
## 1x1-1237x500 1.184710e-01 -0.1528303478 0.389772418 0.9997830
## 250x250-1237x500 2.530864e-01 -0.6084444454 1.114617285 1.0000000
## 300x100-1237x500 -8.024691e-02 -0.5894966277 0.429002801 1.0000000
## 300x1050-1237x500 -1.204768e-01 -0.2907305629 0.049776966 0.7176504
## 300x250-1237x500 1.043427e-01 -0.0294752776 0.238160680 0.4833911
## 300x251-1237x500 -1.810829e-01 -0.3453020809 -0.016863638 0.0110835
## 300x50-1237x500 3.233371e-02 -0.1018817174 0.166549142 1.0000000
## 300x600-1237x500 1.482399e-01 0.0130211618 0.283458713 0.0122583
## 320x100-1237x500 -1.635802e-01 -0.3427118662 0.015551372 0.1455313
## 320x106-1237x500 7.530864e-01 -0.4579402819 1.964113121 0.9176190
## 320x107-1237x500 6.102293e-01 0.1360537791 1.084404775 0.0004282
## 320x480-1237x500 1.975309e-02 -0.4396435344 0.479149707 1.0000000
## 320x50-1237x500 -2.214536e-02 -0.1559214598 0.111630738 1.0000000
## 325x508-1237x500 -8.665717e-02 -0.2514956748 0.078181335 0.9922924
## 336x280-1237x500 1.135515e-01 -0.0570808976 0.284183970 0.8324911
## 468x60-1237x500 -2.124986e-02 -0.1964798533 0.153980126 1.0000000
## 480x360-1237x500 3.530864e-01 -0.4198077409 1.125980580 0.9994370
## 600x300-1237x500 8.641975e-02 -0.9053894583 1.078228964 1.0000000
## 620x366-1237x500 -4.691358e-02 -0.2366342885 0.142807128 1.0000000
## 640x360-1237x500 4.197531e-01 -0.2879088042 1.127414977 0.9538282
## 640x480-1237x500 5.442952e-01 0.3213047131 0.767285709 0.0000000
## 728x90-1237x500 1.536615e-01 0.0183774606 0.288945631 0.0064827
## 750x570-1237x500 7.530864e-01 -0.9543344871 2.460507327 0.9997307
## 970x250-1237x500 2.671426e-01 0.1291252328 0.405160056 0.0000000
## 970x500-1237x500 -2.469136e-01 -0.8298532267 0.336026066 0.9998892
## 970x66-1237x500 -2.469136e-01 -1.9543344871 1.460507327 1.0000000
## 970x90-1237x500 3.249818e-02 -0.1654654737 0.230461843 1.0000000
## 160x600-1280x1280 5.812274e-01 -1.1240176756 2.286472549 0.9999995
## 180x150-1280x1280 2.500000e-01 -1.6530899092 2.153089909 1.0000000
## 1x1-1280x1280 3.653846e-01 -1.3530798769 2.083849108 1.0000000
## 250x250-1280x1280 5.000000e-01 -1.4030899092 2.403089909 1.0000000
## 300x100-1280x1280 1.666667e-01 -1.6050136212 1.938346955 1.0000000
## 300x1050-1280x1280 1.264368e-01 -1.5789963350 1.831869898 1.0000000
## 300x250-1280x1280 3.512563e-01 -1.3509255604 2.053438123 1.0000000
## 300x251-1280x1280 6.583072e-02 -1.6390105396 1.770671982 1.0000000
## 300x50-1280x1280 2.792473e-01 -1.4229658410 1.981460426 1.0000000
## 300x600-1280x1280 3.951535e-01 -1.3071390212 2.097446057 1.0000000
## 320x100-1280x1280 8.333333e-02 -1.6230089266 1.789675593 1.0000000
## 320x106-1280x1280 1.000000e+00 -1.0847305445 3.084730545 0.9985134
## 320x107-1280x1280 8.571429e-01 -0.9047760325 2.619061747 0.9980559
## 320x480-1280x1280 2.666667e-01 -1.4913324876 2.024665821 1.0000000
## 320x50-1280x1280 2.247682e-01 -1.4774103305 1.926946769 1.0000000
## 325x508-1280x1280 1.602564e-01 -1.5446446143 1.865157435 1.0000000
## 336x280-1280x1280 3.604651e-01 -1.3450058446 2.065936077 1.0000000
## 468x60-1280x1280 2.256637e-01 -1.4802733630 1.931600797 1.0000000
## 480x360-1280x1280 6.000000e-01 -1.2646396849 2.464639685 0.9999999
## 600x300-1280x1280 3.333333e-01 -1.6321694733 2.298836140 1.0000000
## 620x366-1280x1280 2.000000e-01 -1.5074863742 1.907486374 1.0000000
## 640x360-1280x1280 6.666667e-01 -1.1718928571 2.505226190 0.9999974
## 640x480-1280x1280 7.912088e-01 -0.9202936286 2.502711211 0.9992829
## 728x90-1280x1280 4.005751e-01 -1.3017226022 2.102872854 1.0000000
## 750x570-1280x1280 1.000000e+00 -1.4072394821 3.407239482 0.9999287
## 970x250-1280x1280 5.140562e-01 -1.1884609048 2.216573355 1.0000000
## 970x500-1280x1280 -1.195710e-13 -1.7942503734 1.794250373 1.0000000
## 970x66-1280x1280 1.062483e-13 -2.4072394821 2.407239482 1.0000000
## 970x90-1280x1280 2.794118e-01 -1.4290101331 1.987833663 1.0000000
## 180x150-160x600 -3.312274e-01 -1.1884381296 0.525983256 0.9999872
## 1x1-160x600 -2.158428e-01 -0.4730959663 0.041410323 0.3047494
## 250x250-160x600 -8.122744e-02 -0.9384381296 0.775983256 1.0000000
## 300x100-160x600 -4.145608e-01 -0.9164671473 0.087345607 0.3408259
## 300x1050-160x600 -4.547907e-01 -0.6016275865 -0.307953724 0.0000000
## 300x250-160x600 -2.299712e-01 -0.3323527533 -0.127589558 0.0000000
## 300x251-160x600 -5.153967e-01 -0.6551918978 -0.375601534 0.0000000
## 300x50-160x600 -3.019801e-01 -0.4048806871 -0.199079601 0.0000000
## 300x600-160x600 -1.860739e-01 -0.2902797608 -0.081868077 0.0000000
## 320x100-160x600 -4.978941e-01 -0.6549382921 -0.340849915 0.0000000
## 320x106-160x600 4.187726e-01 -0.7891845604 1.626729687 0.9999992
## 320x107-160x600 2.759154e-01 -0.1903646917 0.742195532 0.9552445
## 320x480-160x600 -3.145608e-01 -0.7658034935 0.136681953 0.7488471
## 320x50-160x600 -3.564592e-01 -0.4587860702 -0.254132365 0.0000000
## 325x508-160x600 -4.209710e-01 -0.5614931701 -0.280448883 0.0000000
## 336x280-160x600 -2.207623e-01 -0.3680381419 -0.073486499 0.0000048
## 468x60-160x600 -3.555637e-01 -0.5081425177 -0.202984922 0.0000000
## 480x360-160x600 1.877256e-02 -0.7493030345 0.786848161 1.0000000
## 600x300-160x600 -2.478941e-01 -1.2359529337 0.740164727 1.0000000
## 620x366-160x600 -3.812274e-01 -0.5502501787 -0.212204695 0.0000000
## 640x360-160x600 8.543923e-02 -0.6169567351 0.787835195 1.0000000
## 640x480-160x600 2.099814e-01 0.0043129264 0.415649782 0.0375052
## 728x90-160x600 -1.806523e-01 -0.2849428848 -0.076361737 0.0000000
## 750x570-160x600 4.187726e-01 -1.2864725492 2.124017676 1.0000000
## 970x250-160x600 -6.717121e-02 -0.1749837627 0.040641339 0.9158063
## 970x500-160x600 -5.812274e-01 -1.1577631005 -0.004691773 0.0447783
## 970x66-160x600 -5.812274e-01 -2.2864725492 1.124017676 0.9999995
## 970x90-160x600 -3.018157e-01 -0.4800412318 -0.123590112 0.0000000
## 1x1-180x150 1.153846e-01 -0.7678308112 0.998600042 1.0000000
## 250x250-180x150 2.500000e-01 -0.9536197411 1.453619741 1.0000000
## 300x100-180x150 -8.333333e-02 -1.0660847366 0.899418070 1.0000000
## 300x1050-180x150 -1.235632e-01 -0.9811478461 0.734021409 1.0000000
## 300x250-180x150 1.012563e-01 -0.7498443594 0.952356922 1.0000000
## 300x251-180x150 -1.841693e-01 -1.0405763097 0.672237752 1.0000000
## 300x50-180x150 2.924729e-02 -0.8219159296 0.880410515 1.0000000
## 300x600-180x150 1.451535e-01 -0.7061684934 0.996475529 1.0000000
## 320x100-180x150 -1.666667e-01 -1.0260578386 0.692724505 1.0000000
## 320x106-180x150 7.500000e-01 -0.7241271050 2.224127105 0.9955999
## 320x107-180x150 6.071429e-01 -0.3578998632 1.572185577 0.9060021
## 320x480-180x150 1.666667e-02 -0.9412009153 0.974534249 1.0000000
## 320x50-180x150 -2.523178e-02 -0.8763258377 0.825862276 1.0000000
## 325x508-180x150 -8.974359e-02 -0.9462695859 0.766782406 1.0000000
## 336x280-180x150 1.104651e-01 -0.7471947678 0.968125000 1.0000000
## 468x60-180x150 -2.433628e-02 -0.8829226788 0.834250112 1.0000000
## 480x360-180x150 3.500000e-01 -0.7918539455 1.491853946 1.0000000
## 600x300-180x150 8.333333e-02 -1.2167245735 1.383391240 1.0000000
## 620x366-180x150 -5.000000e-02 -0.9116606040 0.811660604 1.0000000
## 640x360-180x150 4.166667e-01 -0.6820828047 1.515416138 0.9999919
## 640x480-180x150 5.412088e-01 -0.3283829587 1.410800541 0.9168332
## 728x90-180x150 1.505751e-01 -0.7007572611 1.001907513 1.0000000
## 750x570-180x150 7.500000e-01 -1.1530899092 2.653089909 0.9999793
## 970x250-180x150 2.640562e-01 -0.5877147862 1.115827236 1.0000000
## 970x500-180x150 -2.500000e-01 -1.2728800911 0.772880091 1.0000000
## 970x66-180x150 -2.500000e-01 -2.1530899092 1.653089909 1.0000000
## 970x90-180x150 2.941176e-02 -0.8341012111 0.892924741 1.0000000
## 250x250-1x1 1.346154e-01 -0.7486000420 1.017830811 1.0000000
## 300x100-1x1 -1.987179e-01 -0.7438503450 0.346414448 0.9999970
## 300x1050-1x1 -2.389478e-01 -0.4974442600 0.019548592 0.1282976
## 300x250-1x1 -1.412833e-02 -0.2502243093 0.221967641 1.0000000
## 300x251-1x1 -2.995539e-01 -0.5541162854 -0.044991503 0.0032433
## 300x50-1x1 -8.613732e-02 -0.3224587981 0.150184153 0.9999970
## 300x600-1x1 2.976890e-02 -0.2071238485 0.266661653 1.0000000
## 320x100-1x1 -2.820513e-01 -0.5464793350 -0.017623229 0.0193323
## 320x106-1x1 6.346154e-01 -0.5919325070 1.861163276 0.9941098
## 320x107-1x1 4.917582e-01 -0.0207614758 1.004277959 0.0844267
## 320x480-1x1 -9.871795e-02 -0.5975960034 0.400160106 1.0000000
## 320x50-1x1 -1.406164e-01 -0.3766886366 0.095455845 0.9512550
## 325x508-1x1 -2.051282e-01 -0.4600905373 0.049834127 0.4044442
## 336x280-1x1 -4.919499e-03 -0.2636654857 0.253826487 1.0000000
## 468x60-1x1 -1.397209e-01 -0.4015216005 0.122079803 0.9901283
## 480x360-1x1 2.346154e-01 -0.5623786189 1.031609388 1.0000000
## 600x300-1x1 -3.205128e-02 -1.0427537951 0.978651231 1.0000000
## 620x366-1x1 -1.653846e-01 -0.4370977091 0.106328478 0.9360951
## 640x360-1x1 3.012821e-01 -0.4326248976 1.035189000 0.9999457
## 640x480-1x1 4.258242e-01 0.1299207789 0.721727573 0.0000185
## 728x90-1x1 3.519051e-02 -0.2017395252 0.272120546 1.0000000
## 750x570-1x1 6.346154e-01 -1.0838491077 2.353079877 0.9999959
## 970x250-1x1 1.486716e-01 -0.0898296779 0.387172897 0.9153088
## 970x500-1x1 -3.653846e-01 -0.9799191462 0.249149915 0.9523800
## 970x66-1x1 -3.653846e-01 -2.0838491077 1.353079877 1.0000000
## 970x90-1x1 -8.597285e-02 -0.3635042329 0.191558532 1.0000000
## 300x100-250x250 -3.333333e-01 -1.3160847366 0.649418070 0.9999996
## 300x1050-250x250 -3.735632e-01 -1.2311478461 0.484021409 0.9997943
## 300x250-250x250 -1.487437e-01 -0.9998443594 0.702356922 1.0000000
## 300x251-250x250 -4.341693e-01 -1.2905763097 0.422237752 0.9958680
## 300x50-250x250 -2.207527e-01 -1.0719159296 0.630410515 1.0000000
## 300x600-250x250 -1.048465e-01 -0.9561684934 0.746475529 1.0000000
## 320x100-250x250 -4.166667e-01 -1.2760578386 0.442724505 0.9981763
## 320x106-250x250 5.000000e-01 -0.9741271050 1.974127105 0.9999996
## 320x107-250x250 3.571429e-01 -0.6078998632 1.322185577 0.9999956
## 320x480-250x250 -2.333333e-01 -1.1912009153 0.724534249 1.0000000
## 320x50-250x250 -2.752318e-01 -1.1263258377 0.575862276 0.9999999
## 325x508-250x250 -3.397436e-01 -1.1962695859 0.516782406 0.9999758
## 336x280-250x250 -1.395349e-01 -0.9971947678 0.718125000 1.0000000
## 468x60-250x250 -2.743363e-01 -1.1329226788 0.584250112 0.9999999
## 480x360-250x250 1.000000e-01 -1.0418539455 1.241853946 1.0000000
## 600x300-250x250 -1.666667e-01 -1.4667245735 1.133391240 1.0000000
## 620x366-250x250 -3.000000e-01 -1.1616606040 0.561660604 0.9999991
## 640x360-250x250 1.666667e-01 -0.9320828047 1.265416138 1.0000000
## 640x480-250x250 2.912088e-01 -0.5783829587 1.160800541 0.9999997
## 728x90-250x250 -9.942487e-02 -0.9507572611 0.751907513 1.0000000
## 750x570-250x250 5.000000e-01 -1.4030899092 2.403089909 1.0000000
## 970x250-250x250 1.405622e-02 -0.8377147862 0.865827236 1.0000000
## 970x500-250x250 -5.000000e-01 -1.5228800911 0.522880091 0.9978730
## 970x66-250x250 -5.000000e-01 -2.4030899092 1.403089909 1.0000000
## 970x90-250x250 -2.205882e-01 -1.0841012111 0.642924741 1.0000000
## 300x1050-300x100 -4.022989e-02 -0.5427746426 0.462314872 1.0000000
## 300x250-300x100 1.845896e-01 -0.3068085339 0.675987763 0.9999936
## 300x251-300x100 -1.008359e-01 -0.6013685039 0.399696613 1.0000000
## 300x50-300x100 1.125806e-01 -0.3789259056 0.604087158 1.0000000
## 300x600-300x100 2.284869e-01 -0.2632946108 0.720268313 0.9992067
## 320x100-300x100 -8.333333e-02 -0.5889547590 0.422288092 1.0000000
## 320x106-300x100 8.333333e-01 -0.4667245735 2.133391240 0.8848482
## 320x107-300x100 6.904762e-01 0.0208439841 1.360108397 0.0325036
## 320x480-300x100 1.000000e-01 -0.5592496828 0.759249683 1.0000000
## 320x50-300x100 5.810155e-02 -0.4332851928 0.549488298 1.0000000
## 325x508-300x100 -6.410256e-03 -0.5071463363 0.494325823 1.0000000
## 336x280-300x100 1.937984e-01 -0.3088747210 0.696471620 0.9999879
## 468x60-300x100 5.899705e-02 -0.4452553057 0.563249406 1.0000000
## 480x360-300x100 4.333333e-01 -0.4727186920 1.339385359 0.9985956
## 600x300-300x100 1.666667e-01 -0.9320828047 1.265416138 1.0000000
## 620x366-300x100 3.333333e-02 -0.4761358377 0.542802504 1.0000000
## 640x360-300x100 5.000000e-01 -0.3510876809 1.351087681 0.9593017
## 640x480-300x100 6.245421e-01 0.1017709976 1.147313252 0.0023690
## 728x90-300x100 2.339085e-01 -0.2578909641 0.725707882 0.9987390
## 750x570-300x100 8.333333e-01 -0.9383469545 2.605013621 0.9989860
## 970x250-300x100 3.473896e-01 -0.1451687581 0.839947875 0.7247470
## 970x500-300x100 -1.666667e-01 -0.9172554492 0.583922116 1.0000000
## 970x66-300x100 -1.666667e-01 -1.9383469545 1.605013621 1.0000000
## 970x90-300x100 1.127451e-01 -0.3998507457 0.625340942 1.0000000
## 300x250-300x1050 2.248195e-01 0.1193528538 0.330286146 0.0000000
## 300x251-300x1050 -6.060606e-02 -0.2026761657 0.081464044 0.9998701
## 300x50-300x1050 1.528105e-01 0.0468400265 0.258780996 0.0000173
## 300x600-300x1050 2.687167e-01 0.1614783135 0.375955159 0.0000000
## 320x100-300x1050 -4.310345e-02 -0.2021760701 0.115969174 1.0000000
## 320x106-300x1050 8.735632e-01 -0.3346592918 2.081785729 0.6698159
## 320x107-300x1050 7.307061e-01 0.2637388768 1.197673274 0.0000010
## 320x480-300x1050 1.402299e-01 -0.3117227864 0.592182557 1.0000000
## 320x50-300x1050 9.833144e-02 -0.0070820653 0.203744941 0.1163860
## 325x508-300x1050 3.381963e-02 -0.1089658561 0.176605113 1.0000000
## 336x280-300x1050 2.340283e-01 0.0845914269 0.383465242 0.0000010
## 468x60-300x1050 9.922694e-02 -0.0554388702 0.253892741 0.8837810
## 480x360-300x1050 4.735632e-01 -0.2949296869 1.242056124 0.9259139
## 600x300-300x1050 2.068966e-01 -0.7814867109 1.195279814 1.0000000
## 620x366-300x1050 7.356322e-02 -0.0973458484 0.244472285 0.9998417
## 640x360-300x1050 5.402299e-01 -0.1626223850 1.243082155 0.5199266
## 640x480-300x1050 6.647720e-01 0.4575505722 0.871993447 0.0000000
## 728x90-300x1050 2.741383e-01 0.1668175837 0.381459105 0.0000000
## 750x570-300x1050 8.735632e-01 -0.8318698982 2.578996335 0.9950479
## 970x250-300x1050 3.876194e-01 0.2768730305 0.498365856 0.0000000
## 970x500-300x1050 -1.264368e-01 -0.7035282764 0.450654713 1.0000000
## 970x66-300x1050 -1.264368e-01 -1.8318698982 1.578996335 1.0000000
## 970x90-300x1050 1.529750e-01 -0.0270404939 0.332990460 0.2767329
## 300x251-300x250 -2.854256e-01 -0.3808447511 -0.190006370 0.0000000
## 300x50-300x250 -7.200899e-02 -0.0842829775 -0.059735000 0.0000000
## 300x600-300x250 4.389724e-02 0.0233792808 0.064415192 0.0000000
## 320x100-300x250 -2.679229e-01 -0.3871915785 -0.148654318 0.0000000
## 320x106-300x250 6.487437e-01 -0.5548851866 1.852372624 0.9884321
## 320x107-300x250 5.058866e-01 0.0509368292 0.960836322 0.0095620
## 320x480-300x250 -8.458961e-02 -0.5241144996 0.354935270 1.0000000
## 320x50-300x250 -1.264881e-01 -0.1322251194 -0.120751005 0.0000000
## 325x508-300x250 -1.909999e-01 -0.2874809670 -0.094518775 0.0000000
## 336x280-300x250 9.208835e-03 -0.0968680080 0.115285678 1.0000000
## 468x60-300x250 -1.255926e-01 -0.2389169598 -0.012268169 0.0101453
## 480x360-300x250 2.487437e-01 -0.5125067348 1.009994172 0.9999998
## 600x300-300x250 -1.792295e-02 -1.0006855751 0.964839679 1.0000000
## 620x366-300x250 -1.512563e-01 -0.2859070011 -0.016605562 0.0079595
## 640x360-300x250 3.154104e-01 -0.3795156688 1.010336439 0.9995077
## 640x480-300x250 4.399525e-01 0.2614542428 0.618450777 0.0000000
## 728x90-300x250 4.931884e-02 0.0283748040 0.070262885 0.0000000
## 750x570-300x250 6.487437e-01 -1.0534381232 2.350925560 0.9999909
## 970x250-300x250 1.627999e-01 0.1283663095 0.197233578 0.0000000
## 970x500-300x250 -3.512563e-01 -0.9186675084 0.216154946 0.9218270
## 970x66-300x250 -3.512563e-01 -2.0534381232 1.350925560 1.0000000
## 970x90-300x250 -7.184452e-02 -0.2178803998 0.074191366 0.9976026
## 300x50-300x251 2.134166e-01 0.1174407825 0.309392361 0.0000000
## 300x600-300x251 3.293228e-01 0.2319488381 0.426696756 0.0000000
## 320x100-300x251 1.750261e-02 -0.1350939392 0.170099164 1.0000000
## 320x106-300x251 9.341693e-01 -0.2732176692 2.141556227 0.5031424
## 320x107-300x251 7.913121e-01 0.3265111322 1.256113140 0.0000000
## 320x480-300x251 2.008359e-01 -0.2488782144 0.650550106 0.9996491
## 320x50-300x251 1.589375e-01 0.0635770498 0.254297947 0.0000001
## 325x508-300x251 9.442569e-02 -0.0411077132 0.229959092 0.7500089
## 336x280-300x251 2.946344e-01 0.1521107203 0.437158070 0.0000000
## 468x60-300x251 1.598330e-01 0.0118359692 0.307830022 0.0157903
## 480x360-300x251 5.341693e-01 -0.2330092883 1.301347846 0.7512179
## 600x300-300x251 2.675026e-01 -0.7198590652 1.254864290 1.0000000
## 620x366-300x251 1.341693e-01 -0.0307292281 0.299067786 0.3768010
## 640x360-300x251 6.008359e-01 -0.1005789971 1.302250888 0.2598202
## 640x480-300x251 7.253781e-01 0.5230853797 0.927670761 0.0000000
## 728x90-300x251 3.347444e-01 0.2372797745 0.432209035 0.0000000
## 750x570-300x251 9.341693e-01 -0.7706719816 2.639010540 0.9850243
## 970x250-300x251 4.482255e-01 0.3470011171 0.549449891 0.0000000
## 970x500-300x251 -6.583072e-02 -0.6411707960 0.509509354 1.0000000
## 970x66-300x251 -6.583072e-02 -1.7706719816 1.639010540 1.0000000
## 970x90-300x251 2.135810e-01 0.0392618523 0.387900235 0.0013679
## 300x600-300x50 1.159062e-01 0.0929384936 0.138873957 0.0000000
## 320x100-300x50 -1.959140e-01 -0.3156283545 -0.076199564 0.0000002
## 320x106-300x50 7.207527e-01 -0.4829204508 1.924425865 0.9478638
## 320x107-300x50 5.778956e-01 0.1228287538 1.032962375 0.0005939
## 320x480-300x50 -1.258063e-02 -0.4522266822 0.427065430 1.0000000
## 320x50-300x50 -5.447907e-02 -0.0662877147 -0.042670432 0.0000000
## 325x508-300x50 -1.189909e-01 -0.2160224858 -0.021959279 0.0013411
## 336x280-300x50 8.121782e-02 -0.0253599733 0.187795620 0.5417129
## 468x60-300x50 -5.358358e-02 -0.1673770236 0.060209872 0.9989636
## 480x360-300x50 3.207527e-01 -0.4405677134 1.082073128 0.9999018
## 600x300-300x50 5.408604e-02 -0.9287307843 1.036902865 1.0000000
## 620x366-300x50 -7.924729e-02 -0.2142930128 0.055798427 0.9599104
## 640x360-300x50 3.874194e-01 -0.3075833245 1.082422072 0.9805771
## 640x480-300x50 5.119615e-01 0.3331650737 0.690757923 0.0000000
## 728x90-300x50 1.213278e-01 0.0979786787 0.144676987 0.0000000
## 750x570-300x50 7.207527e-01 -0.9814604265 2.422965841 0.9998901
## 970x250-300x50 2.348089e-01 0.1988616745 0.270756190 0.0000000
## 970x500-300x50 -2.792473e-01 -0.8467523859 0.288257800 0.9975938
## 970x66-300x50 -2.792473e-01 -1.9814604265 1.422965841 1.0000000
## 970x90-300x50 1.644720e-04 -0.1462356965 0.146564640 1.0000000
## 320x100-300x600 -3.118202e-01 -0.4326583907 -0.190981978 0.0000000
## 320x106-300x600 6.048465e-01 -0.5989389668 1.808631931 0.9964767
## 320x107-300x600 4.619893e-01 0.0066255970 0.917353082 0.0410036
## 320x480-300x600 -1.284869e-01 -0.5684402468 0.311466544 1.0000000
## 320x50-300x600 -1.703853e-01 -0.1906283149 -0.150142282 0.0000000
## 325x508-300x600 -2.348971e-01 -0.3333118816 -0.136482334 0.0000000
## 336x280-300x600 -3.468840e-02 -0.1425269961 0.073150193 0.9999999
## 468x60-300x600 -1.694898e-01 -0.2844649481 -0.054514654 0.0000085
## 480x360-300x600 2.048465e-01 -0.5566514618 0.966344426 1.0000000
## 600x300-300x600 -6.182018e-02 -1.0447745308 0.921134162 1.0000000
## 620x366-300x600 -1.951535e-01 -0.3311964603 -0.059110575 0.0000204
## 640x360-300x600 2.715131e-01 -0.4236840078 0.966710305 0.9999834
## 640x480-300x600 3.960553e-01 0.2165044527 0.575606094 0.0000000
## 728x90-300x600 5.421608e-03 -0.0231356803 0.033978896 1.0000000
## 750x570-300x600 6.048465e-01 -1.0974460568 2.307139021 0.9999985
## 970x250-300x600 1.189027e-01 0.0793737172 0.158431697 0.0000000
## 970x500-300x600 -3.951535e-01 -0.9628967401 0.172589704 0.7519958
## 970x66-300x600 -3.951535e-01 -2.0974460568 1.307139021 1.0000000
## 970x90-300x600 -1.157418e-01 -0.2630623051 0.031578799 0.4641930
## 320x106-320x100 9.166667e-01 -0.2928387806 2.126172114 0.5555522
## 320x107-320x100 7.738095e-01 0.3035328420 1.244086206 0.0000001
## 320x480-320x100 1.833333e-01 -0.2720379548 0.638704621 0.9999656
## 320x50-320x100 1.414349e-01 0.0222132462 0.260656526 0.0027445
## 325x508-320x100 7.692308e-02 -0.0763397282 0.230185882 0.9965454
## 336x280-320x100 2.771318e-01 0.1176539407 0.436609625 0.0000000
## 468x60-320x100 1.423304e-01 -0.0220572807 0.306718048 0.2380222
## 480x360-320x100 5.166667e-01 -0.2538416969 1.287175030 0.8202053
## 600x300-320x100 2.500000e-01 -0.7399511435 1.239951144 1.0000000
## 620x366-320x100 1.166667e-01 -0.0630878935 0.296421227 0.8689412
## 640x360-320x100 5.833333e-01 -0.1217220590 1.288388726 0.3368935
## 640x480-320x100 7.078755e-01 0.4933002568 0.922450659 0.0000000
## 728x90-320x100 3.172418e-01 0.1963305093 0.438153076 0.0000000
## 750x570-320x100 9.166667e-01 -0.7896755933 2.623008927 0.9890262
## 970x250-320x100 4.307229e-01 0.3067609590 0.554684824 0.0000000
## 970x500-320x100 -8.333333e-02 -0.6631060355 0.496439369 1.0000000
## 970x66-320x100 -8.333333e-02 -1.7896755933 1.623008927 1.0000000
## 970x90-320x100 1.960784e-01 0.0076444516 0.384512411 0.0283922
## 320x107-320x106 -1.428571e-01 -1.4295807700 1.143866484 1.0000000
## 320x480-320x106 -7.333333e-01 -2.0146843958 0.548017729 0.9715508
## 320x50-320x106 -7.752318e-01 -1.9788560304 0.428392469 0.8788936
## 325x508-320x106 -8.397436e-01 -2.0472149238 0.367727744 0.7535795
## 336x280-320x106 -6.395349e-01 -1.8478108114 0.568741044 0.9913750
## 468x60-320x106 -7.743363e-01 -1.9832700445 0.434597478 0.8857681
## 480x360-320x106 -4.000000e-01 -1.8241420833 1.024142083 1.0000000
## 600x300-320x106 -6.666667e-01 -2.2205330708 0.887199737 0.9998526
## 620x366-320x106 -8.000000e-01 -2.0111190020 0.411119002 0.8439580
## 640x360-320x106 -3.333333e-01 -1.7231536963 1.056487030 1.0000000
## 640x480-320x106 -2.087912e-01 -1.4255656546 1.007983237 1.0000000
## 728x90-320x106 -5.994249e-01 -1.8032176610 0.604367913 0.9970063
## 750x570-320x106 1.301181e-13 -2.0847305445 2.084730545 1.0000000
## 970x250-320x106 -4.859438e-01 -1.6900468006 0.718159250 0.9999636
## 970x500-320x106 -1.000000e+00 -2.3306516905 0.330651690 0.5767240
## 970x66-320x106 -1.000000e+00 -3.0847305445 1.084730545 0.9985134
## 970x90-320x106 -7.205882e-01 -1.9330258213 0.491849351 0.9526289
## 320x480-320x107 -5.904762e-01 -1.2230244303 0.042072049 0.1154770
## 320x50-320x107 -6.323746e-01 -1.0873120677 -0.177437208 0.0000530
## 325x508-320x107 -6.968864e-01 -1.1619066111 -0.231866283 0.0000049
## 336x280-320x107 -4.966777e-01 -0.9637831335 -0.029572348 0.0203244
## 468x60-320x107 -6.314791e-01 -1.1002835450 -0.162674736 0.0001296
## 480x360-320x107 -2.571429e-01 -1.1439559175 0.629670203 1.0000000
## 600x300-320x107 -5.238095e-01 -1.6067488260 0.559129778 0.9982565
## 620x366-320x107 -6.571429e-01 -1.1315540370 -0.182731677 0.0000587
## 640x360-320x107 -1.904762e-01 -1.0210527203 0.640100339 1.0000000
## 640x480-320x107 -6.593407e-02 -0.5546024428 0.422734311 1.0000000
## 728x90-320x107 -4.565677e-01 -0.9119508712 -0.001184591 0.0482765
## 750x570-320x107 1.428571e-01 -1.6190617468 1.904776033 1.0000000
## 970x250-320x107 -3.430866e-01 -0.7992892478 0.113115983 0.5749405
## 970x500-320x107 -8.571429e-01 -1.5843915435 -0.129894171 0.0031397
## 970x66-320x107 -8.571429e-01 -2.6190617468 0.904776033 0.9980559
## 970x90-320x107 -5.777311e-01 -1.0554984316 -0.099963753 0.0018276
## 320x50-320x480 -4.189845e-02 -0.4814105833 0.397613689 1.0000000
## 325x508-320x480 -1.064103e-01 -0.5563509255 0.343530413 1.0000000
## 336x280-320x480 9.379845e-02 -0.3582970054 0.545893905 1.0000000
## 468x60-320x480 -4.100295e-02 -0.4948536107 0.412847711 1.0000000
## 480x360-320x480 3.333333e-01 -0.5456662438 1.212332910 0.9999919
## 600x300-320x480 6.666667e-02 -1.0098835574 1.143216891 1.0000000
## 620x366-320x480 -6.666667e-02 -0.5263065476 0.392973214 1.0000000
## 640x360-320x480 4.000000e-01 -0.4222288152 1.222228815 0.9980559
## 640x480-320x480 5.245421e-01 0.0502008601 0.998883389 0.0105445
## 728x90-320x480 1.339085e-01 -0.3060650135 0.573881932 1.0000000
## 750x570-320x480 7.333333e-01 -1.0246658209 2.491332488 0.9999216
## 970x250-320x480 2.473896e-01 -0.1934320372 0.688211154 0.9785436
## 970x500-320x480 -2.666667e-01 -0.9843668160 0.451033483 0.9999952
## 970x66-320x480 -2.666667e-01 -2.0246658209 1.491332488 1.0000000
## 970x90-320x480 1.274510e-02 -0.4503580039 0.475848200 1.0000000
## 325x508-320x50 -6.451181e-02 -0.1609348098 0.031911192 0.8238902
## 336x280-320x50 1.356969e-01 0.0296728911 0.241720903 0.0004907
## 468x60-320x50 8.954975e-04 -0.1123794413 0.114170436 1.0000000
## 480x360-320x50 3.752318e-01 -0.3860113119 1.136474873 0.9975149
## 600x300-320x50 1.085651e-01 -0.8741918113 1.091322039 1.0000000
## 620x366-320x50 -2.476822e-02 -0.1593773184 0.109840880 1.0000000
## 640x360-320x50 4.418984e-01 -0.2530195434 1.136816438 0.8942777
## 640x480-320x50 5.664406e-01 0.3879736995 0.744907444 0.0000000
## 728x90-320x50 1.758069e-01 0.1551321383 0.196481675 0.0000000
## 750x570-320x50 7.752318e-01 -0.9269467692 2.477410331 0.9994715
## 970x250-320x50 2.892880e-01 0.2550174881 0.323558523 0.0000000
## 970x500-320x50 -2.247682e-01 -0.7921695709 0.342633132 0.9999766
## 970x66-320x50 -2.247682e-01 -1.9269467692 1.477410331 1.0000000
## 970x90-320x50 5.464355e-02 -0.0913539627 0.200641053 0.9999942
## 336x280-325x508 2.002087e-01 0.0569719166 0.343445495 0.0000450
## 468x60-325x508 6.540731e-02 -0.0832765856 0.214091199 0.9997455
## 480x360-325x508 4.397436e-01 -0.3275677773 1.207054957 0.9709975
## 600x300-325x508 1.730769e-01 -0.8143879433 1.160541789 1.0000000
## 620x366-325x508 3.974359e-02 -0.1257716590 0.205258838 1.0000000
## 640x360-325x508 5.064103e-01 -0.1951499348 1.207970448 0.6735673
## 640x480-325x508 6.309524e-01 0.4281566400 0.833748122 0.0000000
## 728x90-325x508 2.403187e-01 0.1418142280 0.338823203 0.0000000
## 750x570-325x508 8.397436e-01 -0.8651574349 2.544644614 0.9975495
## 970x250-325x508 3.537998e-01 0.2515738082 0.456025821 0.0000000
## 970x500-325x508 -1.602564e-01 -0.7357735532 0.415260733 1.0000000
## 970x66-325x508 -1.602564e-01 -1.8651574349 1.544644614 1.0000000
## 970x90-325x508 1.191554e-01 -0.0557473625 0.294058071 0.7927865
## 468x60-336x280 -1.348014e-01 -0.2898839405 0.020281142 0.2302888
## 480x360-336x280 2.395349e-01 -0.5290420016 1.008111769 1.0000000
## 600x300-336x280 -2.713178e-02 -1.0155803436 0.961316778 1.0000000
## 620x366-336x280 -1.604651e-01 -0.3317514038 0.010821171 0.1110454
## 640x360-336x280 3.062016e-01 -0.3967425417 1.009145643 0.9997943
## 640x480-336x280 4.307437e-01 0.2232110093 0.638276341 0.0000000
## 728x90-336x280 4.011001e-02 -0.0678104649 0.148030484 0.9999951
## 750x570-336x280 6.395349e-01 -1.0659360772 2.345005845 0.9999939
## 970x250-336x280 1.535911e-01 0.0422634340 0.264918783 0.0000660
## 970x500-336x280 -3.604651e-01 -0.9376684397 0.216738207 0.9135157
## 970x66-336x280 -3.604651e-01 -2.0659360772 1.345005845 1.0000000
## 970x90-336x280 -8.105335e-02 -0.2614270058 0.099320303 0.9996002
## 480x360-468x60 3.743363e-01 -0.3952743656 1.143946932 0.9980623
## 600x300-468x60 1.076696e-01 -0.8815829695 1.096922203 1.0000000
## 620x366-468x60 -2.566372e-02 -0.2015304682 0.150203035 1.0000000
## 640x360-468x60 4.410029e-01 -0.2630712782 1.145077178 0.9105561
## 640x480-468x60 5.655451e-01 0.3542161146 0.776874034 0.0000000
## 728x90-468x60 1.749114e-01 0.0598594609 0.289963357 0.0000030
## 750x570-468x60 7.743363e-01 -0.9316007967 2.480273363 0.9995070
## 970x250-468x60 2.883925e-01 0.1701386579 0.406646358 0.0000000
## 970x500-468x60 -2.256637e-01 -0.8042428371 0.352915404 0.9999839
## 970x66-468x60 -2.256637e-01 -1.9316007967 1.480273363 1.0000000
## 970x90-468x60 5.374805e-02 -0.1309808808 0.238476977 1.0000000
## 600x300-480x360 -2.666667e-01 -1.5097597899 0.976426457 1.0000000
## 620x366-480x360 -4.000000e-01 -1.1730387755 0.373038776 0.9941022
## 640x360-480x360 6.666667e-02 -0.9640517007 1.097385034 1.0000000
## 640x480-480x360 1.912088e-01 -0.5906606118 0.973078194 1.0000000
## 728x90-480x360 -1.994249e-01 -0.9609344178 0.562084669 1.0000000
## 750x570-480x360 4.000000e-01 -1.4646396849 2.264639685 1.0000000
## 970x250-480x360 -8.594378e-02 -0.8479436486 0.676056098 1.0000000
## 970x500-480x360 -6.000000e-01 -1.5494280555 0.349428056 0.9012172
## 970x66-480x360 -6.000000e-01 -2.4646396849 1.264639685 0.9999999
## 970x90-480x360 -3.205882e-01 -1.0956912035 0.454514733 0.9999355
## 620x366-600x300 -1.333333e-01 -1.1252552439 0.858588577 1.0000000
## 640x360-600x300 3.333333e-01 -0.8702864077 1.536953074 1.0000000
## 640x480-600x300 4.578755e-01 -0.5409437909 1.456694707 0.9993957
## 728x90-600x300 6.724179e-02 -0.9157215401 1.050205125 1.0000000
## 750x570-600x300 6.666667e-01 -1.2988361400 2.632169473 0.9999996
## 970x250-600x300 1.807229e-01 -0.8026203525 1.164066136 1.0000000
## 970x500-600x300 -3.333333e-01 -1.4681169078 0.801450241 1.0000000
## 970x66-600x300 -3.333333e-01 -2.2988361400 1.632169473 1.0000000
## 970x90-600x300 -5.392157e-02 -1.0474530173 0.939609880 1.0000000
## 640x360-620x366 4.666667e-01 -0.2411531666 1.174486500 0.8467817
## 640x480-620x366 5.912088e-01 0.3677175676 0.814700015 0.0000000
## 728x90-620x366 2.005751e-01 0.0644672696 0.336682982 0.0000086
## 750x570-620x366 8.000000e-01 -0.9074863742 2.507486374 0.9990618
## 970x250-620x366 3.140562e-01 0.1752312601 0.452881190 0.0000000
## 970x500-620x366 -2.000000e-01 -0.7831313713 0.383131371 0.9999994
## 970x66-620x366 -2.000000e-01 -1.9074863742 1.507486374 1.0000000
## 970x90-620x366 7.941176e-02 -0.1191157519 0.277939281 0.9999705
## 640x480-640x360 1.245421e-01 -0.5929115195 0.841995769 1.0000000
## 728x90-640x360 -2.660915e-01 -0.9613014033 0.429118322 0.9999899
## 750x570-640x360 3.333333e-01 -1.5052261904 2.171892857 1.0000000
## 970x250-640x360 -1.526104e-01 -0.8483573607 0.543136477 1.0000000
## 970x500-640x360 -6.666667e-01 -1.5637918534 0.230458520 0.6045215
## 970x66-640x360 -6.666667e-01 -2.5052261904 1.171892857 0.9999974
## 970x90-640x360 -3.872549e-01 -1.0973285455 0.322818742 0.9860788
## 728x90-640x480 -3.906337e-01 -0.5702336752 -0.211033656 0.0000000
## 750x570-640x480 2.087912e-01 -1.5027112111 1.920293629 1.0000000
## 970x250-640x480 -2.771526e-01 -0.4588203561 -0.095484776 0.0000026
## 970x500-640x480 -7.912088e-01 -1.3859969983 -0.196420584 0.0001820
## 970x66-640x480 -7.912088e-01 -2.5027112111 0.920293629 0.9992829
## 970x90-640x480 -5.117970e-01 -0.7423268290 -0.281267224 0.0000000
## 750x570-728x90 5.994249e-01 -1.1028728538 2.301722602 0.9999988
## 970x250-728x90 1.134811e-01 0.0737292770 0.153232921 0.0000000
## 970x500-728x90 -4.005751e-01 -0.9683339062 0.167183655 0.7239621
## 970x66-728x90 -4.005751e-01 -2.1028728538 1.301722602 1.0000000
## 970x90-728x90 -1.211634e-01 -0.2685438597 0.026217137 0.3519013
## 970x250-750x570 -4.859438e-01 -2.1884609048 1.216573355 1.0000000
## 970x500-750x570 -1.000000e+00 -2.7942503734 0.794250373 0.9806289
## 970x66-750x570 -1.000000e+00 -3.4072394821 1.407239482 0.9999287
## 970x90-750x570 -7.205882e-01 -2.4290101331 0.987833663 0.9998993
## 970x500-970x250 -5.140562e-01 -1.0824724941 0.054360044 0.1603141
## 970x66-970x250 -5.140562e-01 -2.2165733546 1.188460905 1.0000000
## 970x90-970x250 -2.346445e-01 -0.3845378679 -0.084751052 0.0000010
## 970x66-970x500 2.258194e-13 -1.7942503734 1.794250373 1.0000000
## 970x90-970x500 2.794118e-01 -0.3064532872 0.865276817 0.9986701
## 970x90-970x66 2.794118e-01 -1.4290101331 1.987833663 1.0000000
## ================================================================================
# 7. FINAL SUMMARY
cat("\n")
cat(strrep("=", 80), "\n")
## ================================================================================
cat("FINAL SUMMARY: ARE THERE SYSTEMATIC DIFFERENCES?\n")
## FINAL SUMMARY: ARE THERE SYSTEMATIC DIFFERENCES?
cat(strrep("=", 80), "\n\n")
## ================================================================================
# Check each category for systematic differences
check_differences <- function(data, factor_var) {
if (length(unique(data[[factor_var]])) > 1) {
# Run ANOVA
aov_test <- aov(BID_WON_numeric ~ factor(data[[factor_var]]), data = data)
p_value <- summary(aov_test)[[1]]$"Pr(>F)"[1]
# Calculate range of win rates
win_rates <- data %>%
group_by(!!sym(factor_var)) %>%
summarise(win_rate = mean(BID_WON_numeric), .groups = "drop")
range_diff <- max(win_rates$win_rate) - min(win_rates$win_rate)
ratio <- max(win_rates$win_rate) / min(win_rates$win_rate)
return(list(
has_differences = p_value < 0.05,
p_value = p_value,
range = range_diff,
ratio = ratio,
levels = nrow(win_rates)
))
} else {
return(list(
has_differences = FALSE,
p_value = NA,
range = 0,
ratio = 1,
levels = 1
))
}
}
# Check all factors
factors_to_check <- list(
"Device Type" = "DEVICE_TYPE_clean",
"Region" = "DEVICE_GEO_REGION_clean",
"Ad Size" = "SIZE"
)
cat("Systematic Differences Analysis:\n")
## Systematic Differences Analysis:
cat(strrep("-", 80), "\n")
## --------------------------------------------------------------------------------
for (factor_name in names(factors_to_check)) {
factor_var <- factors_to_check[[factor_name]]
result <- check_differences(bids_analysis, factor_var)
cat(factor_name, ":\n")
cat(" Number of categories:", result$levels, "\n")
cat(" Statistical significance (p-value):",
ifelse(is.na(result$p_value), "N/A",
paste(round(result$p_value, 4),
ifelse(result$p_value < 0.05, "**SIGNIFICANT**", "not significant"))), "\n")
cat(" Win rate range:", round(result$range, 4), "\n")
cat(" Best to worst ratio:", round(result$ratio, 2), "x\n")
cat(" Conclusion:",
ifelse(result$has_differences,
"YES - Systematic differences exist",
"NO - No systematic differences"), "\n")
cat(strrep("-", 40), "\n")
}
## Device Type :
## Number of categories: 5
## Statistical significance (p-value): 0 **SIGNIFICANT**
## Win rate range: 0.2506
## Best to worst ratio: 2.12 x
## Conclusion: YES - Systematic differences exist
## ----------------------------------------
## Region :
## Number of categories: 1
## Statistical significance (p-value): N/A
## Win rate range: 0
## Best to worst ratio: 1 x
## Conclusion: NO - No systematic differences
## ----------------------------------------
## Ad Size :
## Number of categories: 38
## Statistical significance (p-value): 0 **SIGNIFICANT**
## Win rate range: 1
## Best to worst ratio: Inf x
## Conclusion: YES - Systematic differences exist
## ----------------------------------------
Visual reprsentation of the data in a geographic map
Q: how does the proximity to portland affect the price of a bid?
quick comparison of price between raw and cleaned data. We see there is no real difference in the mean, meaning that the outliers did not affect the overall trends of the data too much. # price
bids_clean_2 <- bids_clean
This is not necessarily needed for the presentation. Just was central to the original Q4. Just shows the data we fixed did not really change the statistics
library(hexbin)
library(geosphere)
col_comparison <- function(df_1, df_2, col_1, col_2, label_1 = "Before", label_2 = "After") {
bind_rows(
df_1 %>%
summarise(
mean = mean(.data[[col_1]], na.rm = TRUE),
sd = sd(.data[[col_1]], na.rm = TRUE),
min = min(.data[[col_1]], na.rm = TRUE),
max = max(.data[[col_1]], na.rm = TRUE),
n = n(),
na_count = sum(is.na(.data[[col_1]]))
) %>%
mutate(dataset = label_1, .before = 1),
df_2 %>%
summarise(
mean = mean(.data[[col_2]], na.rm = TRUE),
sd = sd(.data[[col_2]], na.rm = TRUE),
min = min(.data[[col_2]], na.rm = TRUE),
max = max(.data[[col_2]], na.rm = TRUE),
n = n(),
na_count = sum(is.na(.data[[col_2]]))
) %>%
mutate(dataset = label_2, .before = 1)
)
}
col_comparison(bids, bids_clean, "PRICE_clean", "PRICE_final")
## # A tibble: 2 × 7
## dataset mean sd min max n na_count
## <chr> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 Before 0.446 5.48 -999 141. 441535 0
## 2 After 0.442 0.658 0.000071 10.00 440983 0
broke oregon into 60 hexes and grouped data. Looked at average of bid price and winning big price of each hex. Plotted average bid and winning bid price vs distance to portland.
#------------------------------------------------------------------------
# Group Zips into Hexes and categorize hex by city with most frequent
# zip per hex
#------------------------------------------------------------------------
bids_clean <- bids_clean_2
# Add city based on major_city per hex
hb <- hexbin(
x = bids_clean$DEVICE_GEO_LONG_clean,
y = bids_clean$DEVICE_GEO_LAT_clean,
xbins = 60, # same as bins in ggplot
IDs = TRUE
)
# Get cell assignments for each point
bids_clean$hex_id <- hb@cID
zip_city_lookup <- zipcodeR::zip_code_db %>%
filter(state == "OR") %>%
select(zipcode, major_city)
# Add city based on zip to bids_clean.
bids_clean <- bids_clean %>%
left_join(
zip_city_lookup %>% select(zipcode, city = major_city),
by = c("DEVICE_GEO_ZIP_clean" = "zipcode")
)
# Find top ZIP per hex and join city names
hex_with_city <- bids_clean %>%
# Remove rows where hex_id or ZIP is missing
filter(!is.na(hex_id), !is.na(DEVICE_GEO_ZIP_clean)) %>%
# Count how many bids per hex + ZIP combination
count(hex_id, DEVICE_GEO_ZIP_clean, name = "zip_count") %>%
# Keep only the most frequent ZIP for each hex (1 row per hex)
slice_max(zip_count, n = 1, by = hex_id, with_ties = FALSE) %>%
# Attach city name by matching ZIP to zipcodeR lookup table
left_join(zip_city_lookup, by = c("DEVICE_GEO_ZIP_clean" = "zipcode")) %>%
# Keep only the columns we need for the final join
select(hex_id, zip_count, major_city)
# Join to bids
bids_clean <- bids_clean %>%
left_join(hex_with_city, by = "hex_id")
ggplot() +
geom_sf(data = zip_code_db, fill = "white", color = "gray80") +
stat_summary_hex(data = bids_clean,
aes(x = DEVICE_GEO_LONG_clean, y = DEVICE_GEO_LAT_clean, z = PRICE_final),
fun = mean, bins = 30, alpha = 0.8) +
theme_minimal()
similar to above but did not group bids into regions. Just plotted each bid and calculated the distance to portland for each based on lat/long
#------------------------------------------------------------------------
# Calculate distance each hex is from Portland's hex
#------------------------------------------------------------------------
# Check if 'dist_to_portland_km' exists in the bids_clean data frame
if ("dist_to_portland_km" %in% colnames(bids_clean)) {
cat("'dist_to_portland_km' exists in bids_clean\n")
} else {
cat("Adding 'dist_to_portland_km' to bids_clean\n")
# Step 1: Get hex centers from the hexbin object
hex_centers <- tibble(
hex_id = hb@cell,
hex_x = hb@xcm, # longitude center
hex_y = hb@ycm # latitude center
)
# Step 2: Find Portland's hex (by city name or coordinates)
# Option A: By city name (if you have major_city in hex_with_city)
portland_hex_id <- hex_with_city %>%
filter(major_city == "Portland") %>%
pull(hex_id) %>%
first()
# Step 3: Get Portland's hex center
portland_center <- hex_centers %>%
filter(hex_id == portland_hex_id)
hex_distances <- hex_centers %>%
rowwise() %>%
mutate(
dist_to_portland_km = distHaversine(
c(hex_x, hex_y), # this hex
c(portland_center$hex_x, portland_center$hex_y) # Portland hex
) / 1000 # meters to km
) %>%
ungroup()
# Step 5: Join back to bids_clean
bids_clean <- bids_clean %>%
left_join(hex_distances %>% select(hex_id, dist_to_portland_km), by = "hex_id")
# Calculate mean price per hex with distance
hex_price_distance_all <- bids_clean %>%
filter(!is.na(dist_to_portland_km), !is.na(PRICE_final)) %>%
group_by(hex_id, dist_to_portland_km) %>%
summarise(
avg_price = mean(PRICE_final, na.rm = TRUE),
n_bids = n(),
.groups = "drop"
)
# Calculate mean price per hex with distance
hex_price_distance_winning <- bids_clean %>%
filter(!is.na(dist_to_portland_km), !is.na(PRICE_final), BID_WON_clean == TRUE) %>%
group_by(hex_id, dist_to_portland_km) %>%
summarise(
avg_price = mean(PRICE_final, na.rm = TRUE),
n_bids = n(),
.groups = "drop"
)
}
## Adding 'dist_to_portland_km' to bids_clean
p1 <- ggplot(hex_price_distance_all, aes(x = dist_to_portland_km, y = avg_price)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = TRUE, color = "red") +
labs(
title = "Average Bid Price vs Distance from Portland",
x = "Distance from Portland (km)",
y = "Average Price"
) +
scale_fill_viridis_d(option = "cividis") +
theme_minimal()
p2 <- ggplot(hex_price_distance_winning, aes(x = dist_to_portland_km, y = avg_price)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = TRUE, color = "red") +
labs(
title = "Average Winning Bid Price vs Distance from Portland",
x = "Distance from Portland (km)",
y = "Average Price"
) +
scale_fill_viridis_d(option = "cividis") +
theme_minimal()
p1
p2
#------------------------------------------------------------------------
# Calculate distance each bid lat/long is from Portland
#------------------------------------------------------------------------
# Portland center coordinates
portland_lat <- 45.52
portland_lon <- -122.68
if ("dist_to_portland_row_km_bid" %in% colnames(bids_clean)) {
cat("'dist_to_portland_row_km_bid' exists in bids_clean\n")
} else {
cat("Adding 'dist_to_portland_row_km_bid' to bids_clean\n")
bids_clean <- bids_clean %>%
mutate(
dist_to_portland_row_km_bid = distHaversine(
cbind(DEVICE_GEO_LONG_clean, DEVICE_GEO_LAT_clean),
c(portland_lon, portland_lat)
) / 1000
)
}
## Adding 'dist_to_portland_row_km_bid' to bids_clean
ggplot(bids_clean %>% filter(!is.na(dist_to_portland_row_km_bid), !is.na(PRICE_final), BID_WON_clean == TRUE) %>% sample_n(10000),
aes(x = dist_to_portland_row_km_bid, y = PRICE_final)) +
geom_point(alpha = 0.1, size = 0.3) +
geom_smooth(method = "loess", color = "red") +
labs(title = "Individual Winning Bids (10k sample)", x = "Distance (km)", y = "Price") +
theme_minimal()
# Top populated urban areas (> 50,000) that are present in bids_clean$city
urban_cities <- c("Portland", "Salem", "Eugene", "Gresham", "Hillsboro", "Bend", "Beaverton", "Medford", "Springfield", "Corvallis", "Albany")
# add urban/rural flag
bids_clean <- bids_clean %>%
mutate(area_type = case_when(
city %in% urban_cities ~ "Urban",
TRUE ~ "Rural"
))
urban_summary <- bids_clean %>%
group_by(area_type) %>%
summarise(
n_bids = n(),
avg_price = mean(PRICE_final, na.rm = TRUE),
median_price = median(PRICE_final, na.rm = TRUE),
sd_price = sd(PRICE_final, na.rm = TRUE),
win_rate = mean(BID_WON_clean == TRUE, na.rm = TRUE),
avg_response_time = mean(RESPONSE_TIME_clean, na.rm = TRUE),
n_cities = n_distinct(city)
) %>% print()
## # A tibble: 2 × 8
## area_type n_bids avg_price median_price sd_price win_rate avg_response_time
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Rural 85373 0.489 0.22 0.689 0.293 207.
## 2 Urban 355610 0.431 0.193 0.650 0.268 200.
## # ℹ 1 more variable: n_cities <int>
# urban_summary
# Multiple metrics side-by-side
urban_summary %>%
pivot_longer(cols = c(avg_price, sd_price, median_price, win_rate, avg_response_time),
names_to = "metric", values_to = "value") %>%
ggplot(aes(x = area_type, y = value, fill = area_type)) +
scale_fill_viridis_d(option = "cividis") +
geom_col() +
facet_wrap(~metric, scales = "free_y", nrow = 1) +
labs(title = "Urban vs Rural Comparison", x = "", caption = "85373 rural bids, 355610 urban bids") +
theme_minimal() +
theme(legend.position = "none")
population_or <- read_csv(here("data", "2025 Preliminary Population Estimates.csv"))
population_or <- population_or %>%
rename(city = `Incorporated City/Town`,
population = `Revised Population Estimate july 1, 2024`)
# Add city based on zip to bids_clean.
bids_clean <- bids_clean %>%
select(-any_of("population")) %>% # remove if exists
left_join(
population_or %>% select(city, population),
by = c("major_city" = "city")
)
bids_clean %>%
filter(BID_WON_clean == TRUE, !is.na(population)) %>%
head(10)
## # A tibble: 10 × 27
## row_id DATE_UTC_clean TIMESTAMP_clean AUCTION_ID PUBLISHER_ID PRICE_final
## <int> <date> <dttm> <chr> <chr> <dbl>
## 1 3 2025-10-21 2025-10-21 23:42:37 0000060c-… LteIcOiSsaE5 0.23
## 2 9 2025-10-22 2025-10-22 02:43:14 00000e70-… 3 0.72
## 3 13 2025-10-21 2025-10-21 21:57:38 00001359-… LteIcOiSsaE5 0.352
## 4 15 2025-10-22 2025-10-22 00:08:29 0000b6e8-… LteIcOiSsaE5 0.11
## 5 20 2025-10-22 2025-10-22 04:31:41 00011547-… 243 0.765
## 6 24 2025-10-22 2025-10-22 03:48:57 000196e3-… 3 0.27
## 7 25 2025-10-21 2025-10-21 23:05:35 0001bca2-… 243 1.06
## 8 28 2025-10-22 2025-10-22 04:13:46 0003cbbc-… 3 0.809
## 9 30 2025-10-22 2025-10-22 00:15:39 00057800-… 0b29abca-22… 0.945
## 10 36 2025-10-21 2025-10-21 21:12:42 0005a93e-… LteIcOiSsaE5 2.66
## # ℹ 21 more variables: DEVICE_GEO_REGION_clean <chr>,
## # DEVICE_GEO_ZIP_clean <chr>, DEVICE_GEO_CITY_clean <chr>,
## # DEVICE_GEO_LAT_clean <dbl>, DEVICE_GEO_LONG_clean <dbl>,
## # BID_WON_clean <chr>, RESPONSE_TIME_clean <int>, DEVICE_TYPE_clean <chr>,
## # SIZE <chr>, REQUESTED_SIZES_clean <list>, hour <int>, day_of_week <fct>,
## # pred_prob <dbl>, hex_id <int>, city <chr>, zip_count <int>,
## # major_city <chr>, dist_to_portland_km <dbl>, …
bids_clean %>%
filter(BID_WON_clean == TRUE, !is.na(population)) %>%
ggplot(aes(x = population, y = PRICE_final)) +
geom_point(alpha = 0.1) +
geom_smooth(method = "loess", color = "red") +
labs(
title = "Winning Bid Price vs City Population",
x = "Population",
y = "Price"
) +
scale_x_continuous(labels = scales::comma) +
theme_minimal()
population_or <- read_csv(here("data", "2025 Preliminary Population Estimates.csv"))
population_or <- population_or %>%
rename(city = `Incorporated City/Town`,
population = `Revised Population Estimate july 1, 2024`)
# Add city based on zip to bids_clean.
bids_clean <- bids_clean %>%
select(-any_of("population")) %>% # remove if exists
left_join(
population_or %>% select(city, population),
by = c("major_city" = "city")
)
city_grouping_winning_bids <- bids_clean %>%
filter(BID_WON_clean == TRUE, !is.na(population), !is.na(DEVICE_GEO_ZIP_clean)) %>%
group_by(city) %>%
summarise(
avg_price = mean(PRICE_final, na.rm = TRUE),
n_bids = n(),
avg_population = mean(population, na.rm = TRUE),
ave_km_to_pdx = mean(dist_to_portland_row_km_bid, na.rm = TRUE),
.groups = "drop"
) %>% filter (n_bids > 100) %>%
mutate(km_to_pdx_o_pop = avg_population/ave_km_to_pdx)
city_grouping_winning_bids %>%
ggplot(aes(x = avg_population, y = avg_price, size)) +
geom_point(alpha = 0.5) +
labs(
title = "Average Winning Bid Price vs City Population (by ZIP)",
x = "Population",
y = "Average Winning Price",
size = "# Bids"
) +
scale_x_continuous(labels = scales::comma) +
theme_minimal()
# bids_clean <- bids_clean %>%
# filter(!is.na(dist_to_portland_km), !is.na(PRICE_final)) %>%
# group_by(hex_id, dist_to_portland_km) %>%
# summarise(
# avg_price = mean(PRICE_final, na.rm = TRUE),
# n_bids = n(),
# .groups = "drop"
# )
# urban <- population_or %>% filter(population > 50000) %>% arrange(desc(population))
city_grouping_winning_bids <- city_grouping_winning_bids %>%
mutate(
pop_bucket = cut(
avg_population,
breaks = seq(0, max(avg_population, na.rm = TRUE) + 10000, by = 1000),
labels = FALSE,
include.lowest = TRUE,
right = TRUE
)
)
t <- city_grouping_winning_bids %>%
filter(!is.na(avg_population), !is.na(avg_price)) %>%
group_by(pop_bucket) %>%
summarise(
avg_price = mean(avg_price, na.rm = TRUE),
n_bids = n(),
.groups = "drop"
) %>%
ggplot(aes(x = pop_bucket, y = avg_price)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "loess", color = "red", se = TRUE) +
labs(
title = "Average Winning Bid Price vs City Population (by ZIP)",
x = "Population",
y = "Average Winning Price",
size = "# Bids"
) +
scale_x_continuous(labels = scales::comma) +
theme_minimal()
#Data Visualization
#Synthesis
*Notes: Things to make clear in the sythesis portion of this all for our own use:
What were points of uncertainity? What steps did we take to address them and how could they still be showing up? How did we clean and prepare the data set for our work? What choices did we make and how did we standardize?
#Presentation Assignment:
##We will decide who is generating what info and transferring it to the slides. ##We also need to decide who will present what and how to field questions.
Final presentation ask: Your final presentation, in 20–30 minutes, should tell a clear, professional story of your team’s data-cleaning workflow, exploratory analysis, and collaboration practices. The goal is to demonstrate not only what you found, but how you worked as a data science team.
Things we need to include in the final presentation:
Key Issues You Found Examples include: • Missingness patterns • Formatting inconsistencies (dates, numeric/character mismatches, categorical typos) • Duplicates • Outliers or implausible values • Structural issues (names, types, consistency)
Your Cleaning Strategy For each issue: • What you observed • Why it was a problem • Principle behind your fix (e.g., “We chose this imputation strategy because…”) • Concise R code or pseudo-code Focus on justifying decisions, not on full code dumps.
Reproducibility and Workflow Highlight: • Git usage (branches, PRs, merge conflicts) • Script modularity and readability • R Markdown / Quarto documentation • Naming conventions and folder structure
EDA Overview Patterns Show: • Distributions • Key relationships • Missingness profiles • Surprising patterns
Deep Dives on Guiding Questions For each: • State the question • Show relevant figures • Interpret results clearly • Connect to cleaning decisions
Visualization Quality Ensure: • Clear labels/titles • Good color choices • No clutter • Interpretation accompanies each visual
Insights Summarize: • 3–5 most important findings • What the data suggests overall • Uncertainties or next steps
Collaboration Reflection Discuss: • What went well in your workflow • What was challenging • Lessons learned for future projects • How GitHub, Jira, and RStudio supported collaboration
GitHub Repository Checklist Confirm your repo includes: • Clean README • Data cleaning scripts • EDA scripts (.qmd or .Rmd) • Clear folder structure (data/, R/, figs/) • Kanban snapshot • Evidence of teamwork (commit history, pull requests, etc.)